Patent application number | Description | Published |
20120159130 | MECHANISM FOR CONFLICT DETECTION USING SIMD - A system and method are configured to detect conflicts when converting scalar processes to parallel processes (“SIMDifying”). Conflicts may be detected for an unordered single index, an ordered single index and/or ordered pairs of indices. Conflicts may be further detected for read-after-write dependencies. Conflict detection is configured to identify operations (i.e., iterations) in a sequence of iterations that may not be done in parallel. | 06-21-2012 |
20120166761 | VECTOR CONFLICT INSTRUCTIONS - A processing core implemented on a semiconductor chip is described having first execution unit logic circuitry that includes first comparison circuitry to compare each element in a first input vector against every element of a second input vector. The processing core also has second execution logic circuitry that includes second comparison circuitry to compare a first input value against every data element of an input vector. | 06-28-2012 |
20120254591 | SYSTEMS, APPARATUSES, AND METHODS FOR STRIDE PATTERN GATHERING OF DATA ELEMENTS AND STRIDE PATTERN SCATTERING OF DATA ELEMENTS - Embodiments of systems, apparatuses, and methods for performing gather and scatter stride instruction in a computer processor are described. In some embodiments, the execution of a gather stride instruction causes a conditionally storage of strided data elements from memory into the destination register according to at least some of bit values of a writemask. | 10-04-2012 |
20130179633 | SCATTER-GATHER INTELLIGENT MEMORY ARCHITECTURE FOR UNSTRUCTURED STREAMING DATA ON MULTIPROCESSOR SYSTEMS - A scatter/gather technique optimizes unstructured streaming memory accesses, providing off-chip bandwidth efficiency by accessing only useful data at a fine granularity, and off-loading memory access overhead by supporting address calculation, data shuffling, and format conversion. | 07-11-2013 |
20130297878 | GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed. | 11-07-2013 |
20140040542 | SCATTER-GATHER INTELLIGENT MEMORY ARCHITECTURE FOR UNSTRUCTURED STREAMING DATA ON MULTIPROCESSOR SYSTEMS - A scatter/gather technique optimizes unstructured streaming memory accesses, providing off-chip bandwidth efficiency by accessing only useful data at a fine granularity, and off-loading memory access overhead by supporting address calculation, data shuffling, and format conversion. | 02-06-2014 |
20140095843 | Systems, Apparatuses, and Methods for Performing Conflict Detection and Broadcasting Contents of a Register to Data Element Positions of Another Register - Systems, apparatuses, and methods of performing in a computer processor broadcasting data in response to a single vector packed broadcasting instruction that includes a source writemask register operand, a destination vector register operand, and an opcode. In some embodiments, the data of the source writemask register is zero extended prior to broadcasting. | 04-03-2014 |
20140095850 | LOOP VECTORIZATION METHODS AND APPARATUS - Loop vectorization methods and apparatus are disclosed. An example method includes generating a first control mask for a set of iterations of a loop by evaluating a condition of the loop, wherein generating the first control mask includes setting a bit of the control mask to a first value when the condition indicates that an operation of the loop is to be executed, and setting the bit of the first control mask to a second value when the condition indicates that the operation of the loop is to be bypassed. The example method also includes compressing indexes corresponding to the first set of iterations of the loop according to the first control mask. | 04-03-2014 |
20140096119 | LOOP VECTORIZATION METHODS AND APPARATUS - Loop vectorization methods and apparatus are disclosed. An example method includes setting a dynamic adjustment value of a vectorization loop; executing the vectorization loop to vectorize a loop by grouping iterations of the loop into one or more vectors; identifying a dependency between iterations of the loop as; and setting the dynamic adjustment value based on the identified dependency. | 04-03-2014 |
20140136783 | HYBRID HARDWARE AND SOFTWARE IMPLEMENTATION OF TRANSACTIONAL MEMORY ACCESS - Embodiments of the invention relate a hybrid hardware and software implementation of transactional memory accesses in a computer system. A processor including a transactional cache and a regular cache is utilized in a computer system that includes a policy manager to select one of a first mode (a hardware mode) or a second mode (a software mode) to implement transactional memory accesses. In the hardware mode the transactional cache is utilized to perform read and write memory operations and in the software mode the regular cache is utilized to perform read and write memory operations. | 05-15-2014 |
20140149718 | INSTRUCTION AND LOGIC TO PROVIDE PUSHING BUFFER COPY AND STORE FUNCTIONALITY - Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core. | 05-29-2014 |
20140149802 | Apparatus And Method To Obtain Information Regarding Suppressed Faults - A processor includes an execution unit, a fault mask coupled to the execution unit, and a suppress mask coupled to the execution unit. The fault mask is to store a first plurality of bit values to indicate which elements of a multi-element vector have an associated fault generated in response to execution of an instruction on the element in the execution unit. The suppress mask is to store a second plurality of bit values to indicate which of the elements are to have an associated fault suppressed. The processor also includes counter logic to increment a counter in response to an indication of a first fault associated with the first element and received from the fault mask, and an indication of a first suppression associated with the first element and received from the suppress mask. Other embodiments are described as claimed. | 05-29-2014 |
20140181464 | COALESCING ADJACENT GATHER/SCATTER OPERATIONS - According to one embodiment, a processor includes an instruction decoder to decode a first instruction to gather data elements from memory, the first instruction having a first operand specifying a first storage location and a second operand specifying a first memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the first instruction, to read contiguous a first and a second of the data elements from a memory location based on the first memory address indicated by the second operand, and to store the first data element in a first entry of the first storage location and a second data element in a second entry of a second storage location corresponding to the first entry of the first storage location. | 06-26-2014 |
20140189247 | APPARATUS AND METHOD FOR IMPLEMENTING A SCRATCHPAD MEMORY - An apparatus and method for implementing a scratchpad memory within a cache using priority hints. For example, a method according to one embodiment comprises: providing a priority hint for a scratchpad memory implemented using a portion of a cache; determining a page replacement priority based on the priority hint; storing the page replacement priority in a page table entry (PTE) associated with the page; and using the page replacement priority to determine whether to evict one or more cache lines associated with the scratchpad memory from the cache. | 07-03-2014 |
20140189307 | METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE VECTOR ADDRESS CONFLICT RESOLUTION WITH VECTOR POPULATION COUNT FUNCTIONALITY - Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations. | 07-03-2014 |
20140189308 | METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE VECTOR ADDRESS CONFLICT DETECTION FUNCTIONALITY - Instructions and logic provide SIMD address conflict detection functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store an offset for a data element in a memory. A destination register has corresponding data fields, each of these data fields to store a variable second plurality of bits to store a conflict mask having a mask bit for each offset. Responsive to decoding a vector conflict instruction, execution units compare the offset in each data field with every less significant data field to determine if they hold a matching offset, and in corresponding conflict masks in the destination register, set any mask bits corresponding to a less significant data field with a matching offset. Vector address conflict detection can be used with variable sized elements and to generate conflict masks to resolve dependencies in gather-modify-scatter SIMD operations. | 07-03-2014 |
20140189309 | METHODS, APPARATUS, INSTRUCTIONS, AND LOGIC TO PROVIDE PERMUTE CONTROLS WITH LEADING ZERO COUNT FUNCTIONALITY - Instructions and logic provide SIMD permute controls with leading zero count functionality. Some embodiments include processors with a register with a plurality of data fields, each of the data fields to store a second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of most significant contiguous bits set to zero for corresponding data fields. Responsive to decoding a vector leading zero count instruction, execution units count the number of most significant contiguous bits set to zero for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector leading zero count instructions can be used to generate permute controls and completion masks to be used along with the set of permute controls, to resolve dependencies in gather-modify-scatter SIMD operations. | 07-03-2014 |
20140281401 | Systems, Apparatuses, and Methods for Determining a Trailing Least Significant Masking Bit of a Writemask Register - The execution of a KZBTZ finds a trailing least significant zero bit position in an first input mask and sets an output mask to have the values of the first input mask, but with all bit positions closer to the most significant bit position than the trailing least significant zero bit position in an first input mask set to zero. In some embodiments, a second input mask is used as a writemask such that bit positions of the first input mask are not considered in the trailing least significant zero bit position calculation depending upon a corresponding bit position in the second input mask. | 09-18-2014 |
20140304477 | OBJECT LIVENESS TRACKING FOR USE IN PROCESSING DEVICE CACHE - A processing device comprises a processing device cache and a cache controller. The cache controller initiates a cache line eviction process and determines determine an object liveness value associated with a cache line in the processing device cache. The cache controller applies the object liveness value to a cache line eviction policy and evicts the cache line from the processing device cache based on the object liveness value and the cache line eviction policy. | 10-09-2014 |
20140337580 | GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed. | 11-13-2014 |
20140337600 | PROVIDING METADATA IN A TRANSLATION LOOKASIDE BUFFER (TLB) - In one embodiment, the present invention includes a translation lookaside buffer (TLB) to store entries each having a translation portion to store a virtual address (VA)-to-physical address (PA) translation and a second portion to store bits for a memory page associated with the VA-to-PA translation, where the bits indicate attributes of information in the memory page. Other embodiments are described and claimed. | 11-13-2014 |
20140379998 | DYNAMIC HOME TILE MAPPING - Technologies for dynamic home tile mapping are described. an address request can be received from a processing core, the processing core being associated with a home tile table, the home tile table including respective mappings of one or more directory addresses to one or more home tiles. A buffer can be scanned to identify a presence of the address within the buffer. Based on an identification of the presence of the address within the buffer, a home tile identifier corresponding to the address can be provided from the buffer. | 12-25-2014 |
20150039841 | AUTOMATIC TRANSACTION COARSENING - A processing device comprises an instruction execution unit and track and combing logic to combine a plurality of transactions into a single combined transaction. The track and combine logic comprises a transaction monitoring module to monitor an execution of a plurality of transactions by the instruction execution unit, each of the plurality of transactions comprising a transaction begin instruction, at least one operation instruction and a transaction end instruction. The track and combine logic further comprises a transaction combination module to identify, in view of the monitoring, a subset of the plurality of transactions to combine into a single combined transaction for execution on the processing device and to combine the identified subset of the plurality of transactions into the single combined transaction, the single combined transaction comprising a single transaction begin instruction, a plurality of operation instructions corresponding to the subset of the plurality of transactions and a single transaction end instruction. | 02-05-2015 |
20150052333 | Systems, Apparatuses, and Methods for Stride Pattern Gathering of Data Elements and Stride Pattern Scattering of Data Elements - Embodiments of systems, apparatuses, and methods for performing gather and scatter stride instruction in a computer processor are described. In some embodiments, the execution of a gather stride instruction causes a conditionally storage of strided data elements from memory into the destination register according to at least some of bit values of a writemask. | 02-19-2015 |