Patent application number | Description | Published |
20080288744 | DETECTING MEMORY-HAZARD CONFLICTS DURING VECTOR PROCESSING - A method for performing parallel operations in a computer system when one or more memory hazards may be present, which may be implemented by a processor, is described. During operation, the processor receives instructions for detecting conflict between memory addresses in vectors when memory operations are performed in parallel using at least a portion of the vectors, and tracking positions in at least one of the vectors of any detected conflict between the memory addresses. Next, the processor executes the instructions for detecting the conflict between the memory addresses and tracking the positions. | 11-20-2008 |
20080288745 | GENERATING PREDICATE VALUES DURING VECTOR PROCESSING - A method for performing parallel operations in a computer system when one or more memory hazards may be present, which may be implemented by a processor, is described. During operation, the processor receives instructions for detecting conflict between memory addresses in vectors when operations are performed in parallel using at least a portion of the vectors, and generating one or more predicate values corresponding to any detected conflict between the memory addresses, where a given predicate value indicates elements in at least the portion of the vector that can be processed in parallel. Next, the processor executes the instructions for detecting the conflict between the memory addresses and generating the one or more predicate values. | 11-20-2008 |
20080288754 | GENERATING STOP INDICATORS DURING VECTOR PROCESSING - A method for performing parallel operations in a computer system when one or more memory hazards may be present, which may be implemented by a processor, is described. During operation, the processor receives instructions for detecting conflict between memory addresses in vectors when operations are performed in parallel using at least a portion of the vectors, and generating one or more stop indicators corresponding to any detected conflict between the memory addresses, where a given stop indicator indicates a memory hazard. Next, the processor executes the instructions for detecting the conflict between the memory addresses and generating the one or more stop indicators. | 11-20-2008 |
20090267959 | TECHNIQUE FOR VISUALLY COMPOSITING A GROUP OF GRAPHICAL OBJECTS - Embodiments of a method for visually compositing a group of objects in an image are described. During operation, a processor determines a modified opacity for a first object in a first group of objects based on a first group opacity for the first group of objects and an initial opacity for the first object in the first group of objects. Then, the processor determines a modified opacity for a second object in the first group of objects based on the modified opacity for the first object in the first group of objects and an initial opacity for the second object in the first group of objects, where the modified opacity for the first object in the first group of objects and the modified opacity for the second object in the first group of objects are used to composite the first group of objects. | 10-29-2009 |
20100042789 | CHECK-HAZARD INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a system that determines data dependencies between two vector memory operations or two memory operations that use vectors of memory addresses. During operation, the system receives a first input vector and a second input vector. The first input vector includes a number of elements containing memory addresses for a first memory operation, while the second input vector includes a number of elements containing memory addresses for a second memory operation, wherein the first memory operation occurs before the second memory operation in program order. The system then determines elements in the first and second input vectors where the memory addresses indicate that a dependency exists between the memory operations. The system next generates a result vector, wherein the result vector indicates the elements where dependencies exist between the memory operations. | 02-18-2010 |
20100042807 | INCREMENT-PROPAGATE AND DECREMENT-PROPAGATE INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector with incremented or decremented values from an input vector. During operation, the processor receives an input vector and a control vector. The processor then copies a value contained in a selected element of the input vector. The processor next generates the result vector, which involves writing an incremented or decremented value to the result vector, depending on the value of the control vector and the embodiment. In addition, a predicate vector can be used to control the values that are written to the result vector. | 02-18-2010 |
20100042815 | METHOD AND APPARATUS FOR EXECUTING PROGRAM CODE - The described embodiments provide a system that executes program code. While executing program code, the processor encounters at least one vector instruction and at least one vector-control instruction. The vector instruction includes a set of elements, wherein each element is used to perform an operation for a corresponding iteration of a loop in the program code. The vector-control instruction identifies elements in the vector instruction that may be operated on in parallel without causing an error due to a runtime data dependency between the iterations of the loop. The processor then executes the loop by repeatedly executing the vector-control instruction to identify a next group of elements that can be operated on in the vector instruction and selectively executing the vector instruction to perform the operation for the next group of elements in the vector instruction, until the operation has been performed for all elements of the vector instruction. | 02-18-2010 |
20100042816 | BREAK, PRE-BREAK, AND REMAINING INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a system that sets elements in a result vector based on an input vector. During operation, the system determines a location of a key element within the input vector. Next, the system generates a result vector. When generating the result vector, the system sets one or more elements of the result vector based on the location of the key element in the input vector. | 02-18-2010 |
20100042817 | SHIFT-IN-RIGHT INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector with shifted values from an input vector. During operation, the processor receives an input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain shifted values or propagated values from the input vector, depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector. | 02-18-2010 |
20100042818 | COPY-PROPAGATE, PROPAGATE-POST, AND PROPAGATE-PRIOR INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector with copied or propagated values from an input vector. During operation, the processor receives at least one input vector and a control vector. Using these vectors, the processor generates the result vector, which can contain copied propagated values from the input vector(s), depending on the value of the control vector. In addition, a predicate vector can be used to control the values that are written to the result vector. | 02-18-2010 |
20100049950 | RUNNING-SUM INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector with summed values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element in the second input vector. The processor then writes the sum of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector. | 02-25-2010 |
20100049951 | RUNNING-AND, RUNNING-OR, RUNNING-XOR, AND RUNNING-MULTIPLY INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then writes the product of the base value and values from relevant elements in the first input vector into selected elements in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector. | 02-25-2010 |
20100058037 | RUNNING-SHIFT INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector with shifted values. During operation, the processor receives a first input vector, a second input vector, and a control vector. When generating the result vector, the processor first captures a base value from a key element position in the second input vector. The processor then determines a number of bit positions to shift the base value using selected relevant elements in the first input vector. The processor then shifts the copy of the base value by the number of bit positions and writes the value into a corresponding element in the result vector. In addition, a predicate vector can be used to control the values that are written to the result vector. | 03-04-2010 |
20100077180 | GENERATING PREDICATE VALUES BASED ON CONDITIONAL DATA DEPENDENCY IN VECTOR PROCESSORS - Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating one or more predicate values based on actual dependencies, where a given predicate value indicates data elements that may be safely evaluated in parallel, and where the given actual dependency occurs when the pair of conditions matches one or more criteria. Then, the processor executes the instructions for generating the one or more predicate values. | 03-25-2010 |
20100077182 | GENERATING STOP INDICATORS BASED ON CONDITIONAL DATA DEPENDENCY IN VECTOR PROCESSORS - Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating one or more stop indicators based on actual dependencies, where a given stop indicator indicates the position of a given actual dependency that can lead to different results when the data elements are processed in parallel than when the data elements are processed sequentially, and where the given actual dependency occurs when the pair of conditions matches one or more criteria. Then, the processor executes the instructions for generating the one or more stop indicators. | 03-25-2010 |
20100077183 | CONDITIONAL DATA-DEPENDENCY RESOLUTION IN VECTOR PROCESSORS - Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating a vector of tracked positions of actual dependencies, where a given tracked position indicates the position of a given actual dependency, and where the given actual dependency occurs when the pair of condition matches one or more criteria. Then, the processor executes the instructions for generating the vector of tracked positions. | 03-25-2010 |
20100079313 | METHOD AND APPARATUS FOR COMPRESSING AND DECOMPRESSING DATA - The described embodiments include a system for performing data compression. The system includes a compression mechanism with N channels, and an internal decompression mechanism in the compression mechanism that accepts N channels of fixed-length packets. The compression mechanism is configured to receive an input bit stream that includes a set of data words. In response to receiving a request from the internal decompression mechanism identifying at least one of the channels for which a fixed-length packet is to be appended to the output stream, the system fills a fixed-length packet for the identified channel with compressed data words; appends the fixed-length packet to the output stream; and forwards a copy of the fixed-length packet to the internal decompression mechanism. The internal decompression mechanism decompresses fixed-length packets for each of the channels to determine a next fixed-length packet to be appended to the output stream. | 04-01-2010 |
20100079314 | METHOD AND APPARATUS FOR COMPRESSING AND DECOMPRESSING DATA - One embodiment of the present invention provides an apparatus for compressing data, comprising a compression mechanism which includes N channels. During operation, the compression mechanism receives a set of data words from an input bit-stream, compresses the data words into a set of variable-length words, and stores an I-th variable-length word in the set of variable-length words into a fixed-packet for an I-th channel. Then, the compression mechanism assembles each fixed-length packet into an output stream when the packet becomes full. Some other embodiments of the present invention provide an apparatus for data decompression, comprising a parallel-processing mechanism which includes N decompression mechanisms. During operation, each decompression mechanism retrieves a fixed-length packet from a corresponding channel in an input stream, retrieves and decompresses a set of variable-length words from the fixed-length packet, and assembles the decompressed variable-length words into every N-th position of an output stream beginning at an offset I corresponding to the channel. | 04-01-2010 |
20100325398 | RUNNING-MIN AND RUNNING-MAX INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector that contains results from a comparison operation. During operation, the processor receives a first input vector, a second input vector, and a control vector. When subsequently generating a result vector, the processor first captures a base value from a key element position in the first input vector. For selected elements in the result vector, processor compares the base value and values from relevant elements to the left of a corresponding element in the second input vector, and writes the result into the element in the result vector. In the described embodiments, the key element position and the relevant elements can be defined by the control vector and an optional predicate vector. | 12-23-2010 |
20100325399 | VECTOR TEST INSTRUCTION FOR PROCESSING VECTORS - The described embodiments provide a processor that executes a vector instruction. The processor starts by receiving a vector instruction that uses at least one vector of values that includes N elements as an input. In addition, the processor optionally receives a predicate vector that includes N elements. The processor then executes the vector instruction. In the described embodiments, when executing the vector instruction, if the predicate vector is received, for one or more selected elements in the vector of values for which a corresponding element in the predicate vector is active, otherwise, for one or more selected elements in the vector of values, the processor checks the one or more selected elements to determine if the selected elements contain a predetermined value. When the selected elements contain the predetermined value, the processor sets a corresponding status flag. | 12-23-2010 |
20100325483 | NON-FAULTING AND FIRST-FAULTING INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments include a processor that handles faults during execution of a vector instruction. The processor starts by receiving a vector instruction that uses at least one vector of values that includes N elements as an input. In addition, the processor optionally receives a predicate vector that includes N elements. The processor then executes the vector instruction. In the described embodiments, when executing the vector instruction, if the predicate vector is received, for each element in the vector of values for which a corresponding element in the predicate vector is active, otherwise, for each element in the vector of values, the processor performs an operation for the vector instruction for the element in the vector of values. While performing the operation, the processor conditionally masks faults encountered (i.e., faults caused by an illegal operation). | 12-23-2010 |
20110035567 | ACTUAL INSTRUCTION AND ACTUAL-FAULT INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a vector instruction that optionally receives a predicate vector (which has N elements) as an input. The processor then executes the vector instruction. In the described embodiments, executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element of the result vector, the processor determines element positions for which a fault was masked during a prior operation. The processor then updates elements in the result vector to identify a leftmost element for which a fault was masked. | 02-10-2011 |
20110035568 | SELECT FIRST AND SELECT LAST INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a vector instruction that uses a first input vector, a second input vector, and a control vector, and optionally a predicate vector as inputs, wherein each of the vectors includes N elements. The processor then executes the vector instruction. In the described embodiments, when executing the vector instruction, the processor determines a key element position. If the predicate vector is received, the key element position is a predetermined active element position in the predicate vector, otherwise, the key element position is in a predetermined element position. The processor then uses the key element position to copy a result value into a result variable. When copying the result value into the result variable, if an element in the key element position of the control vector contains a predetermined value, the processor copies a value from the key element position in the second input vector into the result variable. Otherwise, the processor copies a value from the key element position in the first input vector into the result variable. | 02-10-2011 |
20110093681 | REMAINING INSTRUCTION FOR PROCESSING VECTORS - The described embodiments include a processor that executes a vector instruction. The processor starts by receiving an input vector and optionally receiving a predicate vector as inputs. The processor then executes the vector instruction, which causes the processor to determine a key element position in the input vector and generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element of the result vector, the processor sets each element of the result vector to the right of the key element to a first predetermined value and sets each element of the result vector at or to the left of the key element to a second predetermined value. The processor then sets one or more processor status flags based on the values in the result vector. | 04-21-2011 |
20110113217 | GENERATE PREDICTES INSTRUCTION FOR PROCESSING VECTORS - The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a first input vector, a second input vector, and optionally receiving a predicate vector (each of which includes N elements) as inputs. The processor then executes the vector instruction. Executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector was received, for each element of the result vector for which the corresponding element of the predicate vector is active, otherwise, for each element of the result vector, the processor determines elements that are to be set in the result vector based on values in elements in the first input vector and the second input vector. The processor then sets the determined elements of the result vector to a first predetermined value. | 05-12-2011 |
20110276782 | RUNNING SUBTRACT AND RUNNING DIVIDE INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments provide a processor for generating a result vector with subtracted or mathematically divided values from a first input vector. During operation, the processor receives the first input vector, a second input vector, and a control vector, and optionally receives a predicate vector. The processor then records a value from an element at a key element position in the second input vector into a base value. Next, the processor generates a result vector. When generating the result vector, for each active element in the result vector to the right of the key element position, the processor is configured to set the element in the result vector equal to the base value minus a total of the values in each relevant element of the first input vector or to set the element in the result vector equal to the result of dividing the base value by a value in each relevant element of the first input vector, wherein the relevant elements include relevant elements from an element at the key element position to and including a predetermined element in the first input vector. | 11-10-2011 |
20110283092 | GETFIRST AND ASSIGNLAST INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments comprise a processor that executes vector instructions. In the described embodiments, while executing program code, the processor receives a vector instruction that indicates an input vector that includes N elements, wherein receiving the vector instruction comprises optionally receiving a predicate vector that includes N elements. The processor then executes the vector instruction. When executing the vector instruction, if the predicate vector is received, based on active elements in the predicate vector, otherwise, if the predicate vector is not received, based on an assumed predicate vector for which each element is active, the processor sets a value in a scalar register equal to a predetermined element of the input vector. In the described embodiments, the vector instruction can be a GetFirst, an AssignLast1P, or an AssignLast2P instruction. | 11-17-2011 |
20110320749 | PAGE FAULT PREDICTION FOR PROCESSING VECTOR INSTRUCTIONS - The described embodiments comprise a processor that handles a TLB miss while executing a vector read instruction in a processor. In the described embodiments, the processor performs a lookup in a TLB for addresses in active elements in the vector read instruction. The processor then determines that a TLB miss occurred for the address from an active element other than a first active element. Upon predicting that a page table walk for the vector read instruction will result in a page fault, the processor sets a bit in a corresponding bit position in an FSR. In the described embodiments, a set bit in a bit position in FSR indicates that data in a corresponding element of the vector read instruction is invalid. The processor then immediately performs memory reads for at least one of the first active element and other active elements for which TLB misses did not occur. | 12-29-2011 |
20110320763 | USING ADDRESSES TO DETECT OVERLAPPING MEMORY REGIONS - The described embodiments determine if two addressed memory regions overlap. First, a first address for a first memory region and a second address for a second memory region are received. Then a composite address is generated from the first and second addresses. Next, an upper subset and a lower subset of the bits in the addresses are determined. Then, using the upper and lower subsets of the addresses, a determination is made whether the addresses meet a condition from a set of conditions. If so, a determination is made whether the lower subset of the bits in the addresses meet a criteria from a set of criteria. Based on the determination whether the lower subset of the bits in the addresses meet a criteria, a determination is made whether the memory regions overlap or do not overlap. | 12-29-2011 |
20120060020 | VECTOR INDEX INSTRUCTION FOR PROCESSING VECTORS - The described embodiments include a processor that executes a vector instruction. The processor starts by receiving a start value and an increment value, and optionally receiving a predicate vector with N elements as inputs. The processor then executes the vector instruction. Executing the vector instruction causes the processor to generate a result vector. When generating the result vector, if the predicate vector is received, for each element in the result vector for which a corresponding element of the predicate vector is active, otherwise, for each element in the result vector, the processor sets the element in the result vector equal to the start value plus a product of the increment value multiplied by a specified number of elements to the left of the element in the result vector. | 03-08-2012 |
20120079466 | Systems And Methods For Compiler-Based Full-Function Vectorization - Systems and methods for the vectorization of software applications are described. In some embodiments, a compiler may automatically generate both scalar and vector versions of a function from a single source code description. A vector interface may be exposed in a persistent dependency database that is associated with the function. This may allow a compiler to make vector function calls from within vectorized loops, rather than making multiple serialized scalar function calls from within a vectorized loop. This may in turn facilitate the vectorization of hierarchical code, which may improve application performance when vector execution resources are available. | 03-29-2012 |
20120079469 | Systems And Methods For Compiler-Based Vectorization Of Non-Leaf Code - Systems and methods for the vectorization of software applications are described. In some embodiments, source code dependencies can be expressed in ways that can extend a compiler's ability to vectorize otherwise scalar functions. For example, when compiling a called function, a compiler may identify dependencies of the called function on variables other than parameters passed to the called function. The compiler may record these dependencies, e.g., in a dependency file. Later, when compiling a calling function that calls the called function, the same (or another) compiler may reference the previously-identified dependencies and use them to determine whether and how to vectorize the calling function. In particular, these techniques may facilitate the vectorization of non-leaf loops. Because non-leaf loops are relatively common, the techniques described herein can increase the amount of vectorization that can be applied to many applications. | 03-29-2012 |
20120102301 | PREDICATE COUNT AND SEGMENT COUNT INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments comprise a PredCount instruction and a SegCount instruction. When executed by a processor, the PredCount instruction causes the processor to analyze a predicate vector to determine a number of active elements in the predicate vector that exhibit a predetermined condition (e.g., that are set to a predetermined value) and to return a result indicating that number. When executed by a processor, the segCount instruction causes the processor to determine a number of times that a GeneratePredicates instruction would be executed to generate a full set of predicates using active elements of an input vector. | 04-26-2012 |
20120166765 | PREDICTING BRANCHES FOR VECTOR PARTITIONING LOOPS WHEN PROCESSING VECTOR INSTRUCTIONS - While fetching the instructions from a loop in program code, a processor calculates a number of times that a backward-branching instruction at the end of the loop will actually be taken when the fetched instructions are executed. Upon determining that the backward-branching instruction has been predicted taken more than the number of times that the branch instruction will actually be taken, the processor immediately commences a mispredict operation for the branch instruction, which comprises: (1) flushing fetched instructions from the loop that will not be executed from the processor, and (2) commencing fetching instructions from an instruction following the branch instruction. | 06-28-2012 |
20120191944 | PREDICTING A PATTERN IN ADDRESSES FOR A MEMORY-ACCESSING INSTRUCTION WHEN PROCESSING VECTOR INSTRUCTIONS - The described embodiments provide a processor that executes a vector instruction. In the described embodiments, while executing instructions, the processor encounters a vector memory-accessing instruction that performs a memory operation for a set of elements in the memory-accessing instruction. In these embodiments, if an optional predicate vector is received, for each element in the memory-accessing instruction for which a corresponding element of the predicate vector is active, otherwise, for each element in the memory-accessing instruction, upon determining that addresses in the elements are likely to be arranged in a predetermined pattern, the processor predicts that the addresses in the elements are arranged in the predetermined pattern. The processor then performs a fast version of the memory operation corresponding to the predetermined pattern. | 07-26-2012 |
20120191949 | PREDICTING A RESULT OF A DEPENDENCY-CHECKING INSTRUCTION WHEN PROCESSING VECTOR INSTRUCTIONS - The described embodiments include a processor that executes a vector instruction. In the described embodiments, while dispatching instructions at runtime, the processor encounters a dependency-checking instruction. Upon determining that a result of the dependency-checking instruction is predictable, the processor dispatches a prediction micro-operation associated with the dependency-checking instruction, wherein the prediction micro-operation generates a predicted result vector for the dependency-checking instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if a predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, the processor sets the element to zero. | 07-26-2012 |
20120191950 | PREDICTING A RESULT FOR A PREDICATE-GENERATING INSTRUCTION WHEN PROCESSING VECTOR INSTRUCTIONS - The described embodiments provide a processor that executes vector instructions. In the described embodiments, while dispatching instructions at runtime, the processor encounters a predicate-generating instruction. Upon determining that a result of the predicate-generating instruction is predictable, the processor dispatches a prediction micro-operation associated with the predicate-generating instruction, wherein the prediction micro-operation generates a predicted result vector for the predicate-generating instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true. | 07-26-2012 |
20120191957 | PREDICTING A RESULT FOR AN ACTUAL INSTRUCTION WHEN PROCESSING VECTOR INSTRUCTIONS - The described embodiments provide a processor that executes vector instructions. In the described embodiments, while dispatching instructions at runtime, the processor encounters an Actual instruction. Upon determining that a result of the Actual instruction is predictable, the processor dispatches a prediction micro-operation associated with the Actual instruction, wherein the prediction micro-operation generates a predicted result vector for the Actual instruction. The processor then executes the prediction micro-operation to generate the predicted result vector. In the described embodiments, when executing the prediction micro-operation to generate the predicted result vector, if the predicate vector is received, for each element of the predicted result vector for which the predicate vector is active, otherwise, for each element of the predicted result vector, generating the predicted result vector comprises setting the element of the predicted result vector to true. | 07-26-2012 |
20120192005 | SHARING A FAULT-STATUS REGISTER WHEN PROCESSING VECTOR INSTRUCTIONS - The described embodiments provide a processor that executes vector instructions. In the described embodiments, the processor initializes an architectural fault-status register (FSR) and a shadow copy of the architectural FSR by setting each of N bit positions in the architectural FSR and the shadow copy of the architectural FSR to a first predetermined value. The processor then executes a first first-faulting or non-faulting (FF/NF) vector instruction. While executing the first vector instruction, the processor also executes one or more subsequent FF/NF instructions. In these embodiments, when executing the first vector instruction and the subsequent vector instructions, the processor updates one or more bit positions in the shadow copy of the architectural FSR to a second predetermined value upon encountering a fault condition. However, the processor does not update bit positions in the architectural FSR upon encountering a fault condition for the first vector instruction and the subsequent vector instructions. | 07-26-2012 |
20120210099 | RUNNING UNARY OPERATION INSTRUCTIONS FOR PROCESSING VECTORS - During operation, a processor generates a result vector. In particular, the processor records a value from an element at a key element position in an input vector into a base value. Next, for each active element in the result vector to the right of the key element position, the processor generates a result vector by setting the element in the result vector equal to a result of performing a unary operation on the base value a number of times equal to a number of relevant elements. The number of relevant elements is determined from the key element position to and including a predetermined element in the result vector, where the predetermined element in the result vector may be one of: a first element to the left of the element in the result vector; or the element in the result vector. | 08-16-2012 |
20120221837 | RUNNING MULTIPLY-ACCUMULATE INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments include RunningMAC1P and RunningMAC2P instructions. In the described embodiments, a processor receives a first input vector, a second input vector, a third input vector, and a control vector. Upon executing a RunningMAC1P or a RunningMAC2P instruction, the processor sets a base value equal to a value from an element at a key element position in the first input vector. Next, the processor generates the result vector by, for each element of the result vector to the right of the key element position, setting the element in the result vector equal to a sum of the base value and a result of multiplying a value in each relevant element of the second input vector by a value in a corresponding element of the third input vector, from an element at the key element position to and including a predetermined element in the second input vector. | 08-30-2012 |
20120233507 | CONFIRM INSTRUCTION FOR PROCESSING VECTORS - The described embodiments include a processor with a fault status register (FSR) that executes a Confirm instruction. In these embodiments, when executing the Confirm instruction, the processor receives a predicate vector that includes N elements. For a first set of bit positions in the FSR for which corresponding elements of the predicate vector are active, the processor determines if at least one of the first set of bit positions in the FSR holds a predetermined value. When at least one of the first set of bit positions in the FSR holds the predetermined value, the processor causes a fault in the processor. | 09-13-2012 |
20120239910 | CONDITIONAL EXTRACT INSTRUCTION FOR PROCESSING VECTORS - The described embodiments include a vector processor that executes a ConditionalExtract instruction. In the described embodiments, the processor receives an input scalar variable, an input vector, and a predicate vector, wherein each of the vectors has N elements. The processor then executes the ConditionalExtract instruction, which causes the processor to determine if at least one element in the predicate vector is active. If so, the processor copies a value from a last element in the input vector for which a corresponding element in the predicate vector is active into a scalar result variable. Otherwise, of no elements of the predicate vector are active, the processor copies a value from the input scalar variable into the scalar result variable. | 09-20-2012 |
20120239911 | VALUE CHECK INSTRUCTION FOR PROCESSING VECTORS - The described embodiments include a processor that executes a ValueCheck instruction. In the described embodiments, the processor receives an input vector and a predicate vector, each including N elements. The processor then executes a ValueCheck instruction, which causes the processor to generate a result vector. When generating the result vector, for each element in a set of elements in the input vector for which a corresponding element of the predicate vector is active, the processor determines if at least one of the elements in the set of elements precedes the element in the input vector and contains a different value than the element in the input vector. If so, the processor writes an identifier for a closest preceding active element that contains the different value into a corresponding element of a result vector. Otherwise, the processor writes a zero in the corresponding element of the result vector. | 09-20-2012 |
20120284560 | READ XF INSTRUCTION FOR PROCESSING VECTORS - The described embodiments include a processor that handles faults. The processor first receives a first input vector, a control vector, and a predicate vector, each vector comprising a plurality of elements. For each element in the first input vector for which a corresponding element in the control vector and the predicate vector are active, the processor then performs a read operation using an address from the element of the first input vector. When a fault condition is encountered while performing the read operation, the processor determines if the element is a first element where a corresponding element of the control vector is active. If so, the processor handles/processes the fault. Otherwise, the processor masks the fault for the element. | 11-08-2012 |
20120317441 | NON-FAULTING AND FIRST FAULTING INSTRUCTIONS FOR PROCESSING VECTORS - The described embodiments include a processor that handles faults during execution of a vector instruction. The processor starts by receiving a vector instruction that uses at least one vector of values that includes N elements as an input. In addition, the processor optionally receives a predicate vector that includes N elements. The processor then executes the vector instruction. In the described embodiments, when executing the vector instruction, if the predicate vector is received, for each element in the vector of values for which a corresponding element in the predicate vector is active, otherwise, for each element in the vector of values, the processor performs an operation for the vector instruction for the element in the vector of values. While performing the operation, the processor conditionally masks faults encountered (i.e., faults caused by an illegal operation). | 12-13-2012 |
20120331341 | SCALAR READXF INSTRUCTION FOR POROCESSING VECTORS - The described embodiments include a processor that handles faults. The processor first receives an input vector, a control vector, and a predicate vector, each vector comprising a plurality of elements. Then, for a first element of the input vector for which corresponding elements of the control vector and the predicate vector are active, the processor performs a scalar read operation using an address from the element of the input vector. When a fault condition is encountered while performing the read operation, the processor determines if the element is a first element where a corresponding element of the control vector is active. If so (i.e., if the element is a first element where a corresponding element of the control vector is active), the processor processes the fault. Otherwise, the processor masks the fault for the element. | 12-27-2012 |
20130007422 | PROCESSING VECTORS USING WRAPPING ADD AND SUBTRACT INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a sum or difference operation on another input vector dependent upon the input vector and the control vector. | 01-03-2013 |
20130024651 | PROCESSING VECTORS USING A WRAPPING ROTATE PREVIOUS INSTRUCTION IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an operand vector, a selection vector, and a control vector are disclosed. The executed instructions may also cause the processor to perform a wrapping rotate previous operation dependent upon the input vectors. | 01-24-2013 |
20130024655 | PROCESSING VECTORS USING WRAPPING INCREMENT AND DECREMENT INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a fixed-value addition operation dependent upon the input vector and the control vector. | 01-24-2013 |
20130024656 | PROCESSING VECTORS USING WRAPPING BOOLEAN INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a Boolean operation on another input vector dependent upon the input vector and the control vector. | 01-24-2013 |
20130024669 | PROCESSING VECTORS USING WRAPPING SHIFT INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a shift operation on another input vector dependent upon the input vector and the control vector. | 01-24-2013 |
20130024670 | PROCESSING VECTORS USING WRAPPING MULTIPLY AND DIVIDE INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a product or quotient operation on another input vector dependent upon the input vector and the control vector. | 01-24-2013 |
20130024671 | PROCESSING VECTORS USING WRAPPING NEGATION INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a negation operation dependent upon the input vector and the control vector. | 01-24-2013 |
20130024672 | PROCESSING VECTORS USING WRAPPING PROPAGATE INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive a basis vector, an operand vector, a selection vector, and a control vector are disclosed. The executed instructions may also cause the processor to perform a wrapping propagate operation dependent upon the input vectors. | 01-24-2013 |
20130036293 | PROCESSING VECTORS USING WRAPPING MINIMA AND MAXIMA INSTRUCTIONS IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an input vector and a control vector are disclosed. The executed instructions may also cause the processor to perform a minima or maxima operation on another input vector dependent upon the input vector and the control vector. | 02-07-2013 |
20130111193 | RUNNING SHIFT FOR DIVIDE INSTRUCTIONS FOR PROCESSING VECTORS | 05-02-2013 |
20130227251 | BRANCH MISPREDICATION BEHAVIOR SUPPRESSION ON ZERO PREDICATE BRANCH MISPREDICT - A method for suppressing branch misprediction behavior is contemplated in which a conditional branch instruction that would cause the flow of control to branch around instructions in response to a determination that a predicate vector is null is predicted not taken. However, in response to detecting that the prediction is incorrect, misprediction behavior is inhibited. | 08-29-2013 |
20130318306 | MACROSCALAR VECTOR PREFETCH WITH STREAMING ACCESS DETECTION - A method and system for implementing vector prefetch with streaming access detection is contemplated in which an execution unit such as a vector execution unit, for example, executes a vector memory access instruction that references an associated vector of effective addresses. The vector of effective addresses includes a number of elements, each of which includes a memory pointer. The vector memory access instruction is executable to perform multiple independent memory access operations using at least some of the memory pointers of the vector of effective addresses. A prefetch unit, for example, may detect a memory access streaming pattern based upon the vector of effective addresses, and in response to detecting the memory access streaming pattern, the prefetch unit may calculate one or more prefetch memory addresses based upon the memory access streaming pattern. Lastly, the prefetch unit may prefetch the one or more prefetch memory addresses into a memory. | 11-28-2013 |
20130318332 | BRANCH MISPREDICTION BEHAVIOR SUPPRESSION USING A BRANCH OPTIONAL INSTRUCTION - A method for suppressing branch misprediction behavior is contemplated in which a branch-optional instruction that would cause the flow of control to branch around instructions in response to a determination that a predicate vector is null is predicted not taken. However, in response to detecting that the prediction is incorrect, misprediction behavior is inhibited. | 11-28-2013 |
20140025938 | PREDICTION OPTIMIZATIONS FOR MACROSCALAR VECTOR PARTITIONING LOOPS - A method of predicting a backward conditional branch instruction used in a vector partitioning loop includes detecting the first conditional branch instruction that occurs after consumption of a dependency index vector by a predicate generating instruction. The dependency index vector includes information indicative of a number of iterations of the vector partitioning loop, and the conditional branch instruction may branch backwards when taken. The conditional branch instruction may then be predicted to be taken a number of times that is determined by the dependency index vector. | 01-23-2014 |
20140059328 | MECHANISM FOR PERFORMING SPECULATIVE PREDICATED INSTRUCTIONS - A mechanism for executing speculative predicated instructions may include execution of initiating execution of a vector instruction when one or more operands upon which the vector instruction depends are available for use, even if a predicate vector that the vector instruction also depends is not available. If the predicate vector was not available, the results of the execution of the vector instruction may be temporarily held until the predicate vector becomes available, at which time, a destination vector may be updated with the results. | 02-27-2014 |
20140289495 | ENHANCED PREDICATE REGISTERS - Systems, apparatuses and methods for utilizing enhanced predicate registers which specify the element width and which elements are to be processed. The predicate size is dynamic, depending on the contents of the enhanced predicate register used for an instruction rather than being a static quality of a specific instruction. Specifying the element size in the enhanced predicate registers results in fewer instructions in an instruction set. | 09-25-2014 |
20140289496 | ENHANCED MACROSCALAR PREDICATE OPERATIONS - Systems, apparatuses and methods for utilizing enhanced macroscalar predicate operations which take enhanced predicate operands that designate the element width and which elements are to be processed. The element width and the number of elements per vector are determined at run-time rather than being defined in the architectural definition of the instruction. This enables additional parallelism when processing smaller-sized data. The instruction performs the requested operation on the elements specified by the enhanced control predicate, assuming an element-width also specified by the enhanced control predicate, and returns the result as an enhanced predicate of the same element width. | 09-25-2014 |
20140289497 | ENHANCED MACROSCALAR COMPARISON OPERATIONS - Systems, apparatuses and methods for utilizing enhanced Macroscalar comparison operations which take an enhanced predicate operand that designates the element width and which elements are to be processed. The element width and the number of elements per vector are determined at run-time rather than being defined in the architectural definition of the instruction. This enables additional parallelism when processing smaller-sized data. The instruction performs the requested operation on the elements specified by the enhanced predicate, assuming an element-width also specified by the enhanced predicate, and returns the result as an enhanced predicate corresponding to the result of the comparison. | 09-25-2014 |
20140289498 | ENHANCED MACROSCALAR VECTOR OPERATIONS - Systems, apparatuses and methods for utilizing enhanced Macroscalar vector operations which take an enhanced predicate operand that designates the element width and which elements are to be processed. The element width and the number of elements per vector are determined at run-time rather than being defined in the architectural definition of the instruction. This enables additional parallelism when processing smaller-sized data. The instruction performs the requested operation on the elements specified by the enhanced predicate, assuming an element-width also specified by the enhanced predicate, and returns the result as a vector of elements of the same element width. | 09-25-2014 |
20140289502 | ENHANCED VECTOR TRUE/FALSE PREDICATE-GENERATING INSTRUCTIONS - Systems, apparatuses and methods for utilizing enhanced vector true/false instructions. The enhanced vector true/false instructions generate enhanced predicates to correspond to the request element width and/or vector size. A vector true instruction generates an enhanced predicate where all elements supported by the processing unit are active. A vector false instruction generates an enhanced predicate where all elements supported by the processing unit are inactive. The enhanced predicate specifies the requested element width in addition to designating the element selectors. | 09-25-2014 |
20140325173 | MEMORY CONTROLLER MAPPING ON-THE-FLY - Systems, methods, and devices for dynamically mapping and remapping memory when a portion of memory is activated or deactivated are provided. In accordance with an embodiment, an electronic device may include several memory banks, one or more processors, and a memory controller. The memory banks may store data in hardware memory locations and may be independently deactivated. The processors may request the data using physical memory addresses, and the memory controller may translate the physical addresses to hardware memory locations. The memory controller may use a first memory mapping function when a first number of memory banks is active and a second memory mapping function when a second number is active. When one of the memory banks is to be deactivated, the memory controller may copy data from only the memory bank that is to be deactivated to the active remainder of memory banks. | 10-30-2014 |
20140331020 | MEMORY CONTROLLER MAPPING ON-THE-FLY - Systems, methods, and devices for dynamically mapping and remapping memory when a portion of memory is activated or deactivated are provided. In accordance with an embodiment, an electronic device may include several memory banks, one or more processors, and a memory controller. The memory banks may store data in hardware memory locations and may be independently deactivated. The processors may request the data using physical memory addresses, and the memory controller may translate the physical addresses to hardware memory locations. The memory controller may use a first memory mapping function when a first number of memory banks is active and a second memory mapping function when a second number is active. When one of the memory banks is to be deactivated, the memory controller may copy data from only the memory bank that is to be deactivated to the active remainder of memory banks. | 11-06-2014 |
20140359253 | INCREASING MACROSCALAR INSTRUCTION LEVEL PARALLELISM - A processor may include a vector functional unit that supports concurrent operations on multiple data elements of a maximum element size. The functional unit may also support concurrent execution of multiple distinct vector program instructions, where the multiple vector instructions each operate on multiple data elements of less than the maximum element size. | 12-04-2014 |
20150058832 | AUTO MULTI-THREADING IN MACROSCALAR COMPILERS - System and methods for the parallelization of software applications are described. In some embodiments, a compiler may automatically identify within source code dependencies of a function called by another function. A persistent database may be generated to store identified dependencies. When calls the function are encountered within the source code, the persistent database may be checked, and a parallelized implementation of the function may be employed dependent upon the dependency indicated in the persistent database. | 02-26-2015 |
20150089187 | Hazard Check Instructions for Enhanced Predicate Vector Operations - A hazard check instruction has operands that specify addresses of vector elements to be read by first and second vector memory operations. The hazard check instruction outputs a dependency vector identifying, for each element position of the first vector corresponding to the first vector memory operation, which element position of the second vector that the element of the first vector depends on (if any). In an embodiment, at least one of the vector memory operations has addresses specified using a scalar address in the operands (and a vector attribute associated with the vector). In an embodiment, the operands may include predicates for one or both of the vector memory operations, indicating which vector elements are active. The dependency vector may be qualified by the predicates, indicating dependencies only for active elements. | 03-26-2015 |
20150089188 | Vector Hazard Check Instruction with Reduced Source Operands - In an embodiment, a processor may implement a vector hazard check instruction to detect dependencies between vector memory operations based on the addresses of the vectors accessed by the vector memory operations. The addresses may be specified via a base address and a vector of indexes for each vector. In an embodiment, one of the base addresses may be an implied (or assumed) zero address, reducing the number of operands of the hazard check instruction. | 03-26-2015 |
20150089189 | Predicate Vector Pack and Unpack Instructions - In an embodiment, a processor may implement a vector instruction set including predicate vectors and multiple vector element sizes. The vector instruction set may include predicate vector pack and unpack instructions. Responsive to the predicate vector pack instruction, the processor may pack predicates from multiple predicate vector source registers into a destination predicate vector register. Responsive to the predicate vector unpack instruction, the processor may select a portion of a source predicate vector register and write the result to a destination predicate vector register. Additionally, the predicate vector register may store one or more vector attributes associated with the corresponding vector. The processor may modify the attribute as part of the pack/unpack operation (e.g. based on a pack/unpack factor). Additionally, vector pack/unpack instructions that are controlled by the attribute in a corresponding predicate vector register may be implemented. | 03-26-2015 |
20150089190 | Predicate Attribute Tracker - In an embodiment, a processor includes a register attribute tracker configured to track one or more attributes corresponding to registers. The register attribute tracker may track the attributes associated with the registers when those registers are used as output registers of instructions that explicitly define the attributes and, if the register attribute tracker has a tracked attribute associated with an input register of an instruction that does not explicitly define the attribute, the register attribute tracker may annotate the instruction with an attribute and/or associate an attribute with the output register of the instruction in the register attribute tracker. | 03-26-2015 |
20150089191 | Early Issue of Null-Predicated Operations - In an embodiment, a processor includes an issue circuit configured to issue instruction operations for execution. The issue circuit may be configured to monitor the source operands of the instruction operations, and to issue instruction operations for which the source operands (including predicate operands, as appropriate) are resolved. Additionally, the issue circuit may be configured to detect a null predicate that indicates that none of the vector elements will be modified by a corresponding instruction operation. The issue circuit may be configured to issue the corresponding instruction operation with the null predicate even if other source operands are not yet resolved. | 03-26-2015 |
20150089192 | Dynamic Attribute Inference - In an embodiment, a processor may be configured to dynamically infer one or more attributes of input and/or output registers of an instruction, given the attributes corresponding to at least one input registers. The inference may be made at the issue circuit/stage of the processor, for those registers that do not have attribute information at the issue circuit/stage. In an embodiment, the processor may also include a register attribute tracker configured to track attributes of registers prior to the issue stage of the processor pipeline. The processor may feed back, to the register attribute tracker, inferred attributes and the register addresses of the registers to which the inferred attributes apply. The register attribute tracker may be configured to may associate the inferred attribute with the identified register attribute tracker may also be configured to infer input register attributes from other input register attributes. | 03-26-2015 |