Patent application number | Description | Published |
20090249026 | Vector instructions to enable efficient synchronization and parallel reduction operations - In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed. | 10-01-2009 |
20100073368 | METHODS AND SYSTEMS TO DETERMINE CONSERVATIVE VIEW CELL OCCLUSION - Methods and systems to determine view cell occlusion, including to project objects of a 3-dimensional graphics environment to a 2-dimensional image plane with respect to the view point, to reduce sizes of corresponding object images, to generate an occluder map from the reduced-size object images, to compare at least a portion of the object images to the occluder map, and to identify an object as occluded with respect to the view cell when pixel depth values of the object image are greater than corresponding pixel depth values of the occluder map. Methods and systems to reduce an object image size include methods and systems to nullify pixel depth values within a radius of an edge pixel, and to determine the radius as a distance from the edge pixel to a second pixel so that a line between the view point and the second pixel is parallel with one or more of a line and a plane that is tangential to a sphere enclosing the view cell and a point on the object that corresponds to the edge pixel. | 03-25-2010 |
20110138122 | GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed. | 06-09-2011 |
20110161060 | Optimization-Based exact formulation and solution of crowd simulation in virtual worlds - A method of computing a collision-free velocity ( | 06-30-2011 |
20110320913 | RELIABILITY SUPPORT IN MEMORY SYSTEMS WITHOUT ERROR CORRECTING CODE SUPPORT - Methods and apparatuses for error correction. A N-bit block data to be stored in a memory device is received. The memory device does not perform any error correction code (ECC) algorithm nor provide designated error correction code storage for the N-bit block of data. Data compression is applied to the N-bit data to compress the block of data to generate a M-bit compressed block of data. A K-bit ECC is computed for the M-bit compressed data, wherein M+K is less than or equal to N. The M-bit compressed data and the K-bit ECC are stored together in the memory device. | 12-29-2011 |
20120042121 | Scatter-Gather Intelligent Memory Architecture For Unstructured Streaming Data On Multiprocessor Systems - A scatter/gather technique optimizes unstructured streaming memory accesses, providing off-chip bandwidth efficiency by accessing only useful data at a fine granularity, and off-loading memory access overhead by supporting address calculation, data shuffling, and format conversion. | 02-16-2012 |
20120137074 | METHOD AND APPARATUS FOR STREAM BUFFER MANAGEMENT INSTRUCTIONS - A method and system to perform stream buffer management instructions in a processor. The stream buffer management instructions facilitate the creation and usage of a dedicated memory space or stream buffer of the processor in one embodiment of the invention. The dedicated memory space is a contiguous memory space and has a sequential or linear addressing scheme in one embodiment of the invention. The processor has logic to execute a stream buffer management instruction to copy data from a source memory address to a destination memory address that is specified with a desired level of memory hierarchy. | 05-31-2012 |
20120159130 | MECHANISM FOR CONFLICT DETECTION USING SIMD - A system and method are configured to detect conflicts when converting scalar processes to parallel processes (“SIMDifying”). Conflicts may be detected for an unordered single index, an ordered single index and/or ordered pairs of indices. Conflicts may be further detected for read-after-write dependencies. Conflict detection is configured to identify operations (i.e., iterations) in a sequence of iterations that may not be done in parallel. | 06-21-2012 |
20120290799 | GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed. | 11-15-2012 |
20130179633 | SCATTER-GATHER INTELLIGENT MEMORY ARCHITECTURE FOR UNSTRUCTURED STREAMING DATA ON MULTIPROCESSOR SYSTEMS - A scatter/gather technique optimizes unstructured streaming memory accesses, providing off-chip bandwidth efficiency by accessing only useful data at a fine granularity, and off-loading memory access overhead by supporting address calculation, data shuffling, and format conversion. | 07-11-2013 |
20130297878 | GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed. | 11-07-2013 |
20130311530 | APPARATUS AND METHOD FOR SELECTING ELEMENTS OF A VECTOR COMPUTATION - An apparatus and method are described for performing a vector reduction. For example, an apparatus according to one embodiment comprises: a reduction logic tree comprised of a set of N-1 reduction logic blocks used to perform reduction in a single operation cycle for N vector elements; a first input vector register storing a first input vector communicatively coupled to the set of reduction logic blocks; a second input vector register storing a second input vector communicatively coupled to the set of reduction logic blocks; a mask register storing a mask value controlling a set of one or more multiplexers, each of the set of multiplexers selecting a value directly from the first input vector register or an output containing a processed value from one of the reduction logic blocks; and an output vector register coupled to outputs of the one or more multiplexers to receive values output passed through by each of the multiplexers responsive to the control signals. | 11-21-2013 |
20130332701 | APPARATUS AND METHOD FOR SELECTING ELEMENTS OF A VECTOR COMPUTATION - An apparatus and method are described for selecting elements to be used in a vector computation. For example, a method according to one embodiment includes the following operations: specifying whether to identify the first, last or next after last active element of an input mask register using an immediate value; identifying the first, last or next after last active element in the input mask register according to the immediate value; reading a value from an input vector register corresponding to the identified first, last or next after last active element in the input mask register; and writing the value to an output vector register. | 12-12-2013 |
20140040542 | SCATTER-GATHER INTELLIGENT MEMORY ARCHITECTURE FOR UNSTRUCTURED STREAMING DATA ON MULTIPROCESSOR SYSTEMS - A scatter/gather technique optimizes unstructured streaming memory accesses, providing off-chip bandwidth efficiency by accessing only useful data at a fine granularity, and off-loading memory access overhead by supporting address calculation, data shuffling, and format conversion. | 02-06-2014 |
20140068226 | VECTOR INSTRUCTIONS TO ENABLE EFFICIENT SYNCHRONIZATION AND PARALLEL REDUCTION OPERATIONS - In one embodiment, a processor may include a vector unit to perform operations on multiple data elements responsive to a single instruction, and a control unit coupled to the vector unit to provide the data elements to the vector unit, where the control unit is to enable an atomic vector operation to be performed on at least some of the data elements responsive to a first vector instruction to be executed under a first mask and a second vector instruction to be executed under a second mask. Other embodiments are described and claimed. | 03-06-2014 |
20140089634 | APPARATUS AND METHOD FOR DETECTING IDENTICAL ELEMENTS WITHIN A VECTOR REGISTER - An apparatus, system and method are described for identifying identical elements in a vector register. For example, a computer implemented method according to one embodiment comprises the operations of: reading each active element from a first vector register, each active element having a defined bit position within the first vector register; reading each element from a second vector register, each element having a defined bit position within the second vector register corresponding to a bit position of a current active element in the first vector register; reading an input mask register, the input mask register identifying active bit positions in the second vector register for which comparisons are to be made with values in the first vector register, the comparison operations comprising: comparing each active element in the second vector register with elements in the first vector register having bit positions preceding the bit position of the current active element in the second vector register; and setting a bit position in an output mask register equal to a true value if all of the preceding bit positions in the first vector register are equal to the bit in the current active bit position in the second vector register. | 03-27-2014 |
20140096119 | LOOP VECTORIZATION METHODS AND APPARATUS - Loop vectorization methods and apparatus are disclosed. An example method includes setting a dynamic adjustment value of a vectorization loop; executing the vectorization loop to vectorize a loop by grouping iterations of the loop into one or more vectors; identifying a dependency between iterations of the loop as; and setting the dynamic adjustment value based on the identified dependency. | 04-03-2014 |
20140149718 | INSTRUCTION AND LOGIC TO PROVIDE PUSHING BUFFER COPY AND STORE FUNCTIONALITY - Instructions and logic provide pushing buffer copy and store functionality. Some embodiments include a first hardware thread or processing core, and a second hardware thread or processing core, a cache to store cache coherent data in a cache line for a shared memory address accessible by the second hardware thread or processing core. Responsive to decoding an instruction specifying a source data operand, said shared memory address as a destination operand, and one or more owner of said shared memory address, one or more execution units copy data from the source data operand to the cache coherent data in the cache line for said shared memory address accessible by said second hardware thread or processing core in the cache when said one or more owner includes said second hardware thread or processing core. | 05-29-2014 |
20140181580 | SPECULATIVE NON-FAULTING LOADS AND GATHERS - According to one embodiment, a processor includes an instruction decoder to decode an instruction to read a plurality of data elements from memory, the instruction having a first operand specifying a storage location, a second operand specifying a bitmask having one or more bits, each bit corresponding to one of the data elements, and a third operand specifying a memory address storing a plurality of data elements. The processor further includes an execution unit coupled to the instruction decoder, in response to the instruction, to read one or more data elements speculatively, based on the bitmask specified by the second operand, from a memory location based on the memory address indicated by the third operand, and to store the one or more data elements in the storage location indicated by the first operand. | 06-26-2014 |
20140189247 | APPARATUS AND METHOD FOR IMPLEMENTING A SCRATCHPAD MEMORY - An apparatus and method for implementing a scratchpad memory within a cache using priority hints. For example, a method according to one embodiment comprises: providing a priority hint for a scratchpad memory implemented using a portion of a cache; determining a page replacement priority based on the priority hint; storing the page replacement priority in a page table entry (PTE) associated with the page; and using the page replacement priority to determine whether to evict one or more cache lines associated with the scratchpad memory from the cache. | 07-03-2014 |
20140189288 | INSTRUCTION TO REDUCE ELEMENTS IN A VECTOR REGISTER WITH STRIDED ACCESS PATTERN - A vector reduction instruction with non-unit strided access pattern is received and executed by the execution circuitry of a processor. In response to the instruction, the execution circuitry performs an associative reduction operation on data elements of a first vector register. Based on values of the mask register and a current element position being processed, the execution circuitry sequentially set one or more data elements of the first vector register to a result, which is generated by the associative reduction operation applied to both a previous data element of the first vector register and a data clement of a third vector register. The previous data element is located more than one element position away from the current element position. | 07-03-2014 |
20140189323 | APPARATUS AND METHOD FOR PROPAGATING CONDITIONALLY EVALUATED VALUES IN SIMD/VECTOR EXECUTION - An apparatus and method for propagating conditionally evaluated values. For example, a method according to one embodiment comprises: reading each value contained in an input mask register, each value being a true value or a false value and having a bit position associated therewith; for each true value read from the input mask register, generating a first result containing the bit position of the true value; for each false value read from the input mask register following the first true value, adding the vector length of the input mask register to a bit position of the last true value read from the input mask register to generate a second result; and storing each of the first results and second results in bit positions of an output register corresponding to the bit positions read from the input mask register. | 07-03-2014 |
20140223139 | SYSTEMS, APPARATUSES, AND METHODS FOR SETTING AN OUTPUT MASK IN A DESTINATION WRITEMASK REGISTER FROM A SOURCE WRITE MASK REGISTER USING AN INPUT WRITEMASK AND IMMEDIATE - Embodiments of systems, apparatuses, and methods for performing in a computer processor generation of a predicate mask based on vector comparison in response to a single instruction are described. | 08-07-2014 |
20140304477 | OBJECT LIVENESS TRACKING FOR USE IN PROCESSING DEVICE CACHE - A processing device comprises a processing device cache and a cache controller. The cache controller initiates a cache line eviction process and determines determine an object liveness value associated with a cache line in the processing device cache. The cache controller applies the object liveness value to a cache line eviction policy and evicts the cache line from the processing device cache based on the object liveness value and the cache line eviction policy. | 10-09-2014 |
20140337580 | GATHER AND SCATTER OPERATIONS IN MULTI-LEVEL MEMORY HIERARCHY - Methods and apparatus relating to gather or scatter operations in a multi-level cache are described. In some embodiments, a logic may determine whether to perform gather or scatter operations at a first memory or a second memory, based in part on a relative performance of performing the gather or scatter operations at the first memory and the second memory. Other embodiments are also described and claimed. | 11-13-2014 |
20140379998 | DYNAMIC HOME TILE MAPPING - Technologies for dynamic home tile mapping are described. an address request can be received from a processing core, the processing core being associated with a home tile table, the home tile table including respective mappings of one or more directory addresses to one or more home tiles. A buffer can be scanned to identify a presence of the address within the buffer. Based on an identification of the presence of the address within the buffer, a home tile identifier corresponding to the address can be provided from the buffer. | 12-25-2014 |