Class / Patent application number | Description | Number of patent applications / Date published |
712003000 | Scalar/vector processor interface | 39 |
20080222388 | Simulation of processor status flags - The dynamic efficient and accurate simulation of processor status flags is described. One exemplary embodiment includes simulation of processor status flags of a first CPU type on a second CPU type using simple arithmetic operations to calculate status flags in parallel, and by keeping an intermediate state that allows efficient calculation of status flags when they are needed. In this way, sufficient intermediate state exists to generate desired status flags either directly or with a simple operation. | 09-11-2008 |
20090150647 | Processing Unit Incorporating Vectorizable Execution Unit - A vectorizable execution unit is capable of being operated in a plurality of modes, with the processing lanes in the vectorizable execution unit grouped into different combinations of logical execution units in different modes. By doing so, processing lanes can be selectively grouped together to operate as different types of vector execution units and/or scalar execution units, and if desired, dynamically switched during runtime to process various types of instruction streams in a manner that is best suited for each type of instruction stream. As a consequence, a single vectorizable execution unit may be configurable, e.g., via software control, to operate either as a vector execution or a plurality of scalar execution units. | 06-11-2009 |
20100217954 | Method and apparatus for obtaining a scalar value directly from a vector register - A method and apparatus for obtaining a scalar value from a vector register for use in a mixed vector and scalar instruction, including providing a vector in a vector register file, and embedding a location identifier of the scalar value within the vector in the bits defining the mixed vector and scalar instruction. The scalar value can be used directly from the vector register without the need to load the scalar to a scalar register prior to executing the instruction. The scalar location identifier may be embedded in the secondary op code of the instruction, or the instruction may have dedicated bits for providing the location of the scalar within the vector. | 08-26-2010 |
20100235607 | PROCESSOR - A processor includes a setting register in which a mode is set, a general-purpose register including a preferred slot used during scalar computing and a slot not used during the scalar computing, a selector configured to select and output data of a register designated by a mode set in the setting register during the scalar computing, and a computing unit configured to execute the scalar computing using the preferred slot of the general-purpose register and store computing result data of the scalar computing in the preferred slot of the general-purpose register. The data of the register output from the selector is stored in the slot of the general-purpose register. | 09-16-2010 |
20100312988 | Data processing apparatus and method for handling vector instructions - A data processing apparatus and method and provided for handling vector instructions. The data processing apparatus has a register data store with a plurality of registers arranged to store data elements. A vector processing unit is then used to execute a sequence of vector instructions, with the vector processing unit having a plurality of lanes of parallel processing and having access to the register data store in order to read data elements from, and write data elements to, the register data store during the execution of the sequence of vector instructions. A skip indication storage maintains a skip indicator for each of the lanes of parallel processing. The vector processing unit is responsive to a vector skip instruction to perform an update operation to set within the skip indication storage the skip indicator for a determined one or more lanes. The vector processing unit is responsive to a vector operation instruction to perform an operation in parallel on data elements input to the plurality of lanes of parallel processing, but to exclude from the performance of the operation any lane whose associated skip indicator is set. This allows the operation specified by vector instructions to be performed conditionally within each of the lanes of parallel processing without any modification to the vector instructions that are specifying those operations. | 12-09-2010 |
20110010522 | MULTIPROCESSOR COMMUNICATION PROTOCOL BRIDGE BETWEEN SCALAR AND VECTOR COMPUTE NODES - A multiprocessor computer system includes a plurality of processor nodes coupled by a direct processor interconnect network, and a plurality of processor nodes coupled by an indirect processor interconnect network. A bridge directly couples the direct processor interconnect network and the indirect processor interconnect network. | 01-13-2011 |
20110145543 | EXECUTION OF VARIABLE WIDTH VECTOR PROCESSING INSTRUCTIONS - A processing unit executes a vector width instruction in a program and the processing unit obtains and supplies the width of an appropriate vector register that will be used to process variable vector processing instructions. Then, when the processing unit executes variable vector processing instructions in the program, the processing unit processes the variable vector processing instructions using the appropriate vector register with the instructions having the same width as the appropriate vector register. The width that the processing unit obtains may be less than an actual width of the appropriate vector register and may set by the processing unit. In this way, many different vector widths can be supported using a single set of instructions for vector processing. New instructions are not required if vector widths are changed and processing units having vector registers of differing widths do not require different code. | 06-16-2011 |
20110161623 | Data Parallel Function Call for Determining if Called Routine is Data Parallel - Mechanisms for performing data parallel function calls in code during runtime are provided. These mechanisms may operate to execute, in the processor, a portion of code having a data parallel function call to a target portion of code. The mechanisms may further operate to determine, at runtime by the processor, whether the target portion of code is a data parallel portion of code or a scalar portion of code and determine whether the calling code is data parallel code or scalar code. Moreover, the mechanisms may operate to execute the target portion of code based on the determination of whether the target portion of code is a data parallel portion of code or a scalar portion of code, and the determination of whether the calling code is data parallel code or scalar code. | 06-30-2011 |
20110307684 | Image Processing Address Generator - An image processing system including a vector processor and a memory adapted for attaching to the vector processor. The memory is adapted to store multiple image frames. The vector processor includes an address generator operatively attached to the memory to access the memory. The address generator is adapted for calculating addresses of the memory over the multiple image frames. The addresses may be calculated over the image frames based upon an image parameter. The image parameter may specify which of the image frames are processed simultaneously. A scalar processor may be attached to the vector processor. The scalar processor provides the image parameter(s) to the address generator for address calculation over the multiple image frames. An input register may be attached to the vector processor. The input register may be adapted to receive a very long instruction word (VLIW) instruction. The VLIW instruction may be configured to transfer only: (i) parameters for image processing calculations over the image frames by the ALU units and (ii) a single bit to the address generator. | 12-15-2011 |
20120124332 | VECTOR PROCESSING CIRCUIT, COMMAND ISSUANCE CONTROL METHOD, AND PROCESSOR SYSTEM - A vector processing circuit includes a vector register file including a plurality of array elements, a command issuance control circuit, and a plurality of pipeline arithmetic units. Each pipeline arithmetic unit performs arithmetic processing of data stored in the array elements indicated as a source by one command in parts through a plurality of cycles and stores the result in the array elements indicated as a destination by the one command through a plurality of cycles. When data word length of a preceding command is longer than that of a subsequent command, the command issuance control circuit changes data sizes of the array elements in accordance with data word length of the command and determines whether there is register interference between the array element to be processed at a non-head cycle of the preceding command, and the array element to be processed at a head cycle of the subsequent command. | 05-17-2012 |
20120260061 | DATA PROCESSING APPARATUS AND METHOD FOR PERFORMING VECTOR OPERATIONS - A data processing apparatus having processing circuitry, a scalar register bank and a vector register bank, including decoding circuitry arranged to decode a sequence of instructions to generate control signals for the processing circuitry. The decoding circuitry is responsive to a decode modifier instruction within the sequence of instructions to alter decoding of a subsequent scalar instruction in the sequence by mapping at least one scalar operand specified by the subsequent scalar instruction to at least one vector operand in the vector register bank, and, in dependence on the scalar operation specified by the subsequent scalar instruction, determining a vector operation to be performed on at least a subset of the operand elements within the at least the one vector operand. Such an approach enables a wide variety of vector operations to be specified without the need to individually define separate vector instructions for those vector operations. | 10-11-2012 |
20130024652 | Scalable Processing Unit - Various methods and systems are provided for processing units that may be scaled. In one embodiment, a processing unit includes a plurality of scalar processing units and a vector processing unit in communication with each of the plurality of scalar processing units. The vector processing unit is configured to coordinate execution of instructions received from the plurality of scalar processing units. In another embodiment, a scalar instruction packet including a pre-fix instruction and a vector instruction packet including a vector instruction is obtained. Execution of the vector instruction may be modified by the pre-fix instruction in a processing unit including a vector processing unit. In another embodiment, a scalar instruction packet including a plurality of partitions is obtained. The location of the partitions is determined based upon a partition indicator included in the scalar instruction packet and a scalar instruction included in a partition is executed by a processing unit. | 01-24-2013 |
20130159665 | SPECIALIZED VECTOR INSTRUCTION AND DATAPATH FOR MATRIX MULTIPLICATION - A data processing element includes an input unit configured to provide instructions for scalar, vector and array processing, and a scalar processing unit configured to provide a scalar pipeline datapath for processing a scalar quantity. Additionally, the data processing element includes a vector processing unit coupled to the scalar processing unit and configured to provide a vector pipeline datapath employing a vector register for processing a one-dimensional vector quantity. The data processing element further includes an array processing unit coupled to the vector processing unit and configured to provide an array pipeline datapath employing a parallel processing structure for processing a two-dimensional vector quantity. A method of operating a data processing element and a MIMO receiver employing a data processing element are also provided. | 06-20-2013 |
20130173884 | PROGRAMMABLE DEVICE FOR SOFTWARE DEFINED RADIO TERMINAL - A programmable device suitable for software defined radio terminal is disclosed. In one aspect, the device includes a scalar cluster providing a scalar data path and a scalar register file and arranged for executing scalar instructions. The device may further include at least two interconnected vector clusters connected with the scalar cluster. Each of the at least two vector clusters provides a vector data path and a vector register file and is arranged for executing at least one vector instruction different from vector instructions performed by any other vector cluster of the at least two vector clusters. | 07-04-2013 |
20130185538 | PROCESSOR WITH INTER-PROCESSING PATH COMMUNICATION - A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage. The instruction stream includes scalar instructions executable by the scalar processor core and vector instructions executable by the vector coprocessor core. The scalar processor core is configured to pass the vector instructions to the vector coprocessor core. The vector coprocessor core configured to process a plurality of data values in parallel while executing each vector instruction passed by the scalar processor core. The vector coprocessor core includes a plurality of processing paths arranged in parallel to process the data values. Each of the processing paths includes an execution unit. Each of the execution units is configured to communicate a result of processing to each other of the execution units. | 07-18-2013 |
20130185539 | PROCESSOR WITH TABLE LOOKUP AND HISTOGRAM PROCESSING UNITS - A processor includes a scalar processor core and a vector coprocessor core coupled to the scalar processor core. The scalar processor core is configured to retrieve an instruction stream from program storage, and pass vector instructions in the instruction stream to the vector coprocessor core. The vector coprocessor core includes a register file, a plurality of execution units, and a table lookup unit. The register file includes a plurality of registers. The execution units are arranged in parallel to process a plurality of data values. The execution units are coupled to the register file. The table lookup unit is coupled to the register file in parallel with the execution units. The table lookup unit is configured to retrieve table values from one or more lookup tables stored in memory by executing table lookup vector instructions in a table lookup loop. | 07-18-2013 |
20140006748 | APPARATUS AND METHOD OF VECTOR UNIT SHARING | 01-02-2014 |
20140040594 | PROGRAMMABLE DEVICE FOR SOFTWARE DEFINED RADIO TERMINAL - A programmable device suitable for software defined radio terminal is disclosed. In one aspect, the device includes a scalar cluster providing a scalar data path and a scalar register file and arranged for executing scalar instructions. The device may further include at least two interconnected vector clusters connected with the scalar cluster. Each of the at least two vector clusters provides a vector data path and a vector register file and is arranged for executing at least one vector instruction different from vector instructions performed by any other vector cluster of the at least two vector clusters. | 02-06-2014 |
20140189287 | COLLAPSING OF MULTIPLE NESTED LOOPS, METHODS AND INSTRUCTIONS - In an embodiment, the present invention is directed to a processor including a decode logic to receive a multi-dimensional loop counter update instruction and to decode the multi-dimensional loop counter update instruction into at least one decoded instruction, and an execution logic to execute the at least one decoded instruction to update at least one loop counter value of a first operand associated with the multi-dimensional loop counter update instruction by a first amount. Methods to collapse loops using such instructions are also disclosed. Other embodiments are described and claimed. | 07-03-2014 |
20140244967 | VECTOR REGISTER ADDRESSING AND FUNCTIONS BASED ON A SCALAR REGISTER DATA VALUE - Techniques are provided for executing a vector alignment instruction. A scalar register file in a first processor is configured to share one or more register values with a second processor, the one or more register values accessed from the scalar register file according to an Rt address specified, in a vector alignment instruction, wherein a start location is determined from one of the shared register values. An alignment circuit in the second processor is configured to align data identified between the start location within a beginning Vu register of a vector register file (VRF) and an end location of a last Vu register of the VRF according to the vector alignment instruction. A store circuit is configured to select the aligned data from the alignment circuit and store the aligned data in the vector register file according to an alignment store address specified by the vector alignment instruction. | 08-28-2014 |
20140244968 | MAPPING VECTOR REPRESENTATIONS ONTO A PREDICATED SCALAR MULTI-THREADED SYSTEM - A system implementing a method for generating code for execution based on a SIMT model with parallel units of threads is provided. The system identifies a loop within a program that includes vector processing. The system generates instructions for a thread that include an instruction to set a predicate based on whether the thread of a parallel unit corresponds to a vector element. The system also generates instructions to perform the vector processing via scalar operations predicated on the predicate. As a result, the system generates instructions to perform the vector processing but to avoid branch divergence within the parallel unit of threads that would be needed to check whether a thread corresponds to a vector element. | 08-28-2014 |
20140297991 | INSTRUCTIONS FOR STORING IN GENERAL PURPOSE REGISTERS ONE OF TWO SCALAR CONSTANTS BASED ON THE CONTENTS OF VECTOR WRITE MASKS - According to one embodiment, an occurrence of an instruction is fetched. The instruction's format specifies its only source operand from a single vector write mask register, and specifies as its destination a single general purpose register. In addition, the instruction's format includes a first field whose contents selects the single vector write mask register, and includes a second field whose contents selects the single general purpose register. The source operand is a write mask including a plurality of one bit vector write mask elements that correspond to different multi-bit data element positions within architectural vector registers. The method also includes, responsive to executing the single occurrence of the single instruction, storing data in the single general purpose register such that its contents represent either a first or second scalar constant based on whether the plurality of one bit vector write mask elements in the source operand are all zero. | 10-02-2014 |
20140297992 | APPARATUS AND METHOD FOR GENERATING VECTOR CODE - An apparatus and method for generating vector code are provided. The apparatus and method generate vector code using scalar-type kernel code, without user's changing a code type or modifying data layout, thereby enhancing user's convenience of use and retaining the portability of OpenCL. | 10-02-2014 |
20140317376 | Digital Processor Having Instruction Set with Complex Angle Function - A digital processor, such as a vector processor or a scalar processor, is provided having an instruction set with a complex angle function. A complex angle is evaluated for an input value, x, by obtaining one or more complex angle software instructions having the input value, x, as an input; in response to at least one of the complex angle software instructions, performing the following steps: invoking at least one complex angle functional unit that implements the one or more complex angle software instructions to apply the complex angle function to the input value, x; and generating an output corresponding to the complex angle of the input value, x, using one or more multipliers of a Multiply Accumulate (MAC) unit of the digital processor, wherein the complex angle software instruction is part of an instruction set of the digital signal processor. Multiplication operations optionally employ one or more multipliers of the MAC unit of the digital processor. | 10-23-2014 |
20140359250 | TYPE INFERENCE FOR INFERRING SCALAR/VECTOR COMPONENTS - Methods and systems are provided for inferring types in a computer program. In one example, a method comprises: identifying a type of at least one expression of the computer program; and annotating the at least one expression in the computer program when the type of the at least one expression is at least one of a varying type and a uniform type. | 12-04-2014 |
20140359251 | SIGNAL PROCESSING DEVICE AND SIGNAL PROCESSING METHOD - A signal processing device including: one or more vector processors configured to perform vector processing to a signal using a parameter, one or more scalar processors configured to perform scalar processing for generating the parameter, a first circuit coupled to the one or more vector processors and the one or more scalar processors and configured to transfer the parameter from the one or more scalar processors to the one or more vector processors, and a second circuit coupled to the one or more vector processors and another circuit that inputs the signal to the second circuit, and configured to transfer the signal among the one or more vector processors and the other circuit. | 12-04-2014 |
20150012723 | PROCESSOR USING MINI-CORES - A mini-core and a processor using such a mini-core are provided in which functional units of the mini-core are divided into a scalar domain processor and a vector domain processor. The processor includes at least one such mini-core, and all or a portion of functional units from among the functional units of the mini-core operate based on an operation mode. | 01-08-2015 |
20150019835 | Predication Methods for Vector Processors - A predication method for vector processors that minimizes the use of embedded predicate fields in most instructions by using separate condition code extensions. Dedicated predicate registers provide fine grain predication of vector instructions where each bit of a predicate register controls 8 bit of the vector data. | 01-15-2015 |
20150019836 | REGISTER FILE STRUCTURES COMBINING VECTOR AND SCALAR DATA WITH GLOBAL AND LOCAL ACCESSES - The number of registers required is reduced by overlapping scalar and vector registers. This also allows increased compiler flexibility when mixing scalar and vector instructions. Local register read ports are minimized by restricting read access. Dedicated predicate registers reduces requirements for general registers, and allows reduction of critical timing paths by allowing the predicate registers to be placed next to the predicate unit. | 01-15-2015 |
20150032990 | Scalable and Parameterized VLSI Architecture for Compressive Sensing Sparse Approximation - Systems and methods for implementing a scalable very-large-scale integration (VLSI) architecture to perform compressive sensing (CS) hardware reconstruction for data signals in accordance with embodiments of the invention are disclosed. The VLSI architecture is optimized for CS signal reconstruction by implementing a reformulation of the orthogonal matching pursuit (OMP) process and utilizing architecture resource sharing techniques. Typically, the VLSI architecture is a CS reconstruction engine that includes a vector and scalar computation cores where the cores can be time-multiplexed (via dynamic configuration) to perform each task associated with OMP. The vector core includes configurable processing elements (PEs) connected in parallel. Further, the cores can be linked by data-path memories, where complex data flow of OMP can be customized utilizing local memory controllers synchronized by a top-level finite-state machine. The computing resources (cores and data-paths) can be reused across the entire OMP process resulting in optimal utilization of the PEs. | 01-29-2015 |
20150067298 | SPLITABLE AND SCALABLE NORMALIZER FOR VECTOR DATA - A hardware circuit component configured to support vector operations in a scalar data path. The hardware circuit component configured to operate in a vector mode configuration and in a scalar mode configuration. The hardware circuit component configured to split the scalar mode configuration into a left half and a right half of the vector mode configuration. The hardware circuit component configured to perform one or more bit shifts over one or more stages of interconnected multiplexers in the vector mode configuration. The hardware circuit component configured to include duplicated coarse shift multiplexers at bit positions that receive data from both the left half and the right half of the vector mode configuration, resulting in one or more coarse shift multiplexers sharing the bit position. | 03-05-2015 |
20150067299 | SPLITABLE AND SCALABLE NORMALIZER FOR VECTOR DATA - A hardware circuit component configured to support vector operations in a scalar data path. The hardware circuit component configured to operate in a vector mode configuration and in a scalar mode configuration. The hardware circuit component configured to split the scalar mode configuration into a left half and a right half of the vector mode configuration. The hardware circuit component configured to perform one or more bit shifts over one or more stages of interconnected multiplexers in the vector mode configuration. The hardware circuit component configured to include duplicated coarse shift multiplexers at bit positions that receive data from both the left half and the right half of the vector mode configuration, resulting in one or more coarse shift multiplexers sharing the bit position. | 03-05-2015 |
20150089187 | Hazard Check Instructions for Enhanced Predicate Vector Operations - A hazard check instruction has operands that specify addresses of vector elements to be read by first and second vector memory operations. The hazard check instruction outputs a dependency vector identifying, for each element position of the first vector corresponding to the first vector memory operation, which element position of the second vector that the element of the first vector depends on (if any). In an embodiment, at least one of the vector memory operations has addresses specified using a scalar address in the operands (and a vector attribute associated with the vector). In an embodiment, the operands may include predicates for one or both of the vector memory operations, indicating which vector elements are active. The dependency vector may be qualified by the predicates, indicating dependencies only for active elements. | 03-26-2015 |
20150089188 | Vector Hazard Check Instruction with Reduced Source Operands - In an embodiment, a processor may implement a vector hazard check instruction to detect dependencies between vector memory operations based on the addresses of the vectors accessed by the vector memory operations. The addresses may be specified via a base address and a vector of indexes for each vector. In an embodiment, one of the base addresses may be an implied (or assumed) zero address, reducing the number of operands of the hazard check instruction. | 03-26-2015 |
20150143073 | DATA PROCESSING SYSTEMS - A data processing system is described in which a plurality of data processing units | 05-21-2015 |
20160026607 | PARALLELIZATION OF SCALAR OPERATIONS BY VECTOR PROCESSORS USING DATA-INDEXED ACCUMULATORS IN VECTOR REGISTER FILES, AND RELATED CIRCUITS, METHODS, AND COMPUTER-READABLE MEDIA - Parallelization of scalar operations by vector processors using data-indexed accumulators in vector register files, related circuits, methods, and computer-readable media are disclosed. In one aspect, a vector processor comprises a vector register file providing a plurality of write ports and a plurality of vector registers each providing a plurality of accumulators. The vector processor receives an input data vector. For each of the plurality of write ports, the vector processor executes vector operation(s) for accessing an input data value of the input data vector, and determining, based on the input data value, a register index for a vector register among the plurality of vector registers, and an accumulator index for an accumulator among the plurality of accumulators of the vector register. Based on the register index, a register value is retrieved from the register index, and a scalar operation is performed based on the register value and the accumulator index. | 01-28-2016 |
20160103784 | ASYMMETRICAL PROCESSOR MEMORY ARCHITECTURE - An asymmetrical processing system is provided. The processor has a vector unit comprised of one or more computational units coupled with a vector memory space and a scalar unit coupled with a data memory space and the vector memory space, the scalar unit accessing one or more memory locations within the vector memory space. | 04-14-2016 |
20160162290 | Processor with Polymorphic Instruction Set Architecture - The present disclosure provides a processor having polymorphic instruction set architecture. The processor comprises a scalar processing unit, at least one polymorphic instruction processing unit, at least one multi-granularity parallel memory and a DMA controller. The polymorphic instruction processing unit comprises at least one functional unit. The polymorphic instruction processing unit is configured to interpret and execute a polymorphic instruction and the functional unit is configured to perform specific data operation tasks. The scalar processing unit is configured to invoke the polymorphic instruction and inquire an execution state of the polymorphic instruction. The DMA controller is configured to transmit configuration information for the polymorphic instruction and transmit data required by the polymorphic instruction to the multi-granularity parallel memory. With the present disclosure, programmers can redefine a processor instruction set based on algorithm characteristics of applications after tape-out of a processor. | 06-09-2016 |
20160188531 | OPERATION PROCESSING APPARATUS AND METHOD - An operation processing apparatus is provided. The operation processing apparatus includes a vector operator and cores. The vector operator processes a vector operation with respect to an instruction that uses the vector operation, and each core includes a scalar operator that processes a scalar operation with respect to an instruction that does not use the vector operation. The vector operator is shared by the cores. | 06-30-2016 |