Entries |
Document | Title | Date |
20090106525 | DESIGN STRUCTURE FOR SCALAR PRECISION FLOAT IMPLEMENTATION ON THE "W" LANE OF VECTOR UNIT - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for image processing, and more specifically to vector units for supporting image processing is provided. A combined vector/scalar unit is provided wherein one or more processing lanes of the vector unit are used for performing scalar operations. An integrated register file is also provided for storing vector and scalar data. Therefore, the transfer of data to memory to exchange data between independent vector and scalar units is obviated and a significant amount of chip area is saved. | 04-23-2009 |
20090144521 | METHOD AND APPARATUS FOR SEARCHING EXTENSIBLE MARKUP LANGUAGE (XML) DATA - Extensible Markup Language (XML) data is represented as a list of structures with each structure in the list representing an aspect of the XML. A set of frequently used elements is extracted from the list of structure representation and stored in packed vectors. The packed vector representation allows Single Instruction Multiple Data (SIMD) instructions to be used directly on the XML data to increase the speed at which the XML data may be searched while minimizing the memory needed to store the XML data. | 06-04-2009 |
20100095086 | Dynamically Aligning Enhanced Precision Vectors Based on Addresses Corresponding to Reduced Precision Vectors - Mechanisms for aligning enhanced precision vectors based on reduced precision data values are provided. At least one data value, having a first precision type, is received for storing in a vector register. The vector register stores data as a vector having a plurality of vector elements. The first precision type is modified to have a second precision type different in precision than the first precision type to thereby generate at least one modified data value. The at least one modified data value is stored in at least one vector element of the plurality of vector elements. An alignment of the at least one modified data value is determined relative to a boundary of a vector element of the vector register. An alignment operation to re-align the at least one modified data value based on the boundary of the vector element of the vector register is performed. | 04-15-2010 |
20100281234 | INTERLEAVED MULTI-THREADED VECTOR PROCESSOR - A method includes providing a processor configured to execute instructions. The method may further include providing a first set of registers in the processor to store first data and first instructions associated with a first thread, and providing a second set of registers in the processor to store second data and second instructions associated with a second thread. The method may further include transmitting the first data and first instructions associated with the first thread to the first set of registers, and executing the first instructions in order to process the first data. The method may further include transmitting the second data and second instructions to the second set of registers while executing the first instructions and processing the first data. A corresponding apparatus is also disclosed and claimed herein. | 11-04-2010 |
20110055516 | Multiprocessor Computer System and Method Having at Least One Processor with a Dynamically Reconfigurable Instruction Set - An innovative realization of computer hardware, software and firmware comprising a multiprocessor system wherein at least one processor can be configured to have a fixed instruction set and one or more processors can be statically or dynamically configured to implement a plurality of processor states in a plurality of technologies. The processor states may be instructions sets for the processors. The technologies may include programmable logic arrays. | 03-03-2011 |
20110302390 | SYSTEMS AND METHODS FOR PROCESSING COMMUNICATIONS SIGNALS fUSING PARALLEL PROCESSING - Systems and methods for performing processing of communications signals on multi-processor architectures. The system consists of a digital interface that translate numbers that represent a waveform in some format to analog signals for use in transmission and translating analog signals to numbers representing those waveforms in some format that can be processed by the commodity digital hardware and software combination. The digital hardware and software incorporates parallel hardware and software that can process single or multiple streams and multiple processing steps as required for the communications system in any combination. In the examples, the use of general purpose graphics processing units (GPGPUs) is illustrated, but the system is not necessarily limited to such an implementation. The system is highly scalable and modular for addressing a wide range of radio requirements, preferably using commodity components. | 12-08-2011 |
20130024651 | PROCESSING VECTORS USING A WRAPPING ROTATE PREVIOUS INSTRUCTION IN THE MACROSCALAR ARCHITECTURE - Embodiments of a system and a method in which a processor may execute instructions that cause the processor to receive an operand vector, a selection vector, and a control vector are disclosed. The executed instructions may also cause the processor to perform a wrapping rotate previous operation dependent upon the input vectors. | 01-24-2013 |
20130232317 | VECTOR PROCESSING APPARATUS AND VECTOR PROCESSING METHOD - A vector processing apparatus includes a storage pointer generation unit and an instruction execution unit including a plurality of vector pipeline units. The storage pointer generation unit receives the vector instruction and range information thereof and generates the storage pointer value. When receiving a succeeding vector instruction being able to be processed in parallel together with a preceding vector instruction, the storage pointer generation unit updates the storage pointer value based on the range information so as to input each element of the succeeding vector instruction into a vector pipeline unit that is unused by the preceding vector instruction, and the instruction execution unit processes in parallel the preceding vector instruction and the succeeding vector instruction according to the storage pointer value. | 09-05-2013 |
20150052330 | VECTOR ARITHMETIC REDUCTION - In a particular embodiment, a method includes executing a vector instruction at a processor. The vector instruction includes a vector input that includes a plurality of elements. Executing the vector instruction includes providing a first element of the plurality of elements as a first output. Executing the vector instruction further includes performing an arithmetic operation on the first element and a second element of the plurality of elements to provide a second output. Executing the vector instruction further includes storing the first output and the second output in an output vector. | 02-19-2015 |
20160188533 | Systems, Apparatuses, and Methods for K Nearest Neighbor Search - Systems, apparatuses, and methods for k-nearest neighbor (KNN) searches are described. In particular, embodiments of a KNN accelerator and its uses are described. In some embodiments, the KNN accelerator includes a plurality of vector partial distance computation circuits each to calculate a partial sum, a minimum sort network to sort partial sums from the plurality of vector partial distance computation circuits to find k nearest neighbor matches and a global control circuit to control aspects of operations of the plurality of vector partial distance computation circuits. | 06-30-2016 |
20160378715 | HARDWARE PROCESSORS AND METHODS FOR TIGHTLY-COUPLED HETEROGENEOUS COMPUTING - Methods and apparatuses relating to tightly-coupled heterogeneous computing are described. In one embodiment, a hardware processor includes a plurality of execution units in parallel, a switch to connect inputs of the plurality of execution units to outputs of a first buffer and a plurality of memory banks and connect inputs of the plurality of memory banks and a plurality of second buffers in parallel to outputs of the first buffer, the plurality of memory banks, and the plurality of execution units, and an offload engine with inputs connected to outputs of the plurality of second buffers. | 12-29-2016 |
20160378716 | METHODS, APPARATUS, INSTRUCTIONS AND LOGIC TO PROVIDE VECTOR PACKED HISTOGRAM FUNCTIONALITY - Instructions and logic provide SIMD vector packed histogram functionality. Some processor embodiments include first and second registers storing, in each of a plurality of data fields of a register lane portion, corresponding elements of a first and of a second data type, respectively. A decode stage decodes an instruction for SIMD vector packed histograms. One or more execution units, compare each element of the first data type, in the first register lane portion, with a range specified by the instruction. For any elements of the first register portion in said range, corresponding elements of the second data type, from the second register portion, are added into one of a plurality data fields of a destination register lane portion, selected according to the value of its corresponding element of the first data type, to generate packed weighted histograms for each destination register lane portion. | 12-29-2016 |