Class / Patent application number | Description | Number of patent applications / Date published |
708607000 | Multiplication of matrices | 31 |
20080256163 | APPARATUS AND METHOD OF GENERATING CODEBOOK FOR MULTIPLE INPUT MULTIPLE OUTPUT COMMUNICATION SYSTEM - An apparatus and method of generating a codebook. The codebook generation apparatus includes a matrix extender to generate a candidate matrix set by multiplying a base matrix and at least one diagonal matrix, wherein the at least one diagonal matrix includes elements of a constrained set as diagonal elements; and a codebook generator to generate the codebook where a minimum distance between the elements is maximized, based on the candidate matrix set. According to aspects of the present invention, it is possible to provide a precoding codebook that can reduce an amount of feedback from a terminal. | 10-16-2008 |
20090024685 | High Speed and Efficient Matrix Multiplication Hardware Module - A matrix multiplication module and matrix multiplication method are provided that use a variable number of multiplier-accumulator units based on the amount of data elements of the matrices are available or needed for processing at a particular point or stage in the computation process. As more data elements become available or are needed, more multiplier-accumulator units are used to perform the necessary multiplication and addition operations. To multiply an N×M matrix by an M×N matrix, the total (maximum) number of used MAC units is “2*N−1”. The number of MAC units used starts with one (1) and increases by two at each computation stage, that is, at the beginning of reading of data elements for each new row of the first matrix. The sequence of the number of MAC units is {1, 3, 5, . . . , 2*N−1} for computation stages each of which corresponds to reading of data elements for each new row of the left hand matrix, also called the first matrix. For the multiplication of two 8×8 matrices, the performance is 16 floating point operations per clock cycle. For an FPGA running at 100 MHz, the performance is 1.6 Giga floating point operations per second. The performance increases with the increase of the clock frequency and the use of larger matrices when FPGA resources permit. Very large matrices are partitioned into smaller blocks to fit in the FPGA resources. Results from the multiplication of sub-matrices are combined to form the final result of the large matrices. | 01-22-2009 |
20090030964 | MATRIX OPERATION DEVICE - There is provided a matrix operation device comprising a k201-th power weighting multiplication circuit ( | 01-29-2009 |
20090043836 | METHOD AND SYSTEM FOR LARGE NUMBER MULTIPLICATION - Methods, apparatus and systems for large number multiplication. A multiplication circuit is provided to compute the product of two operands (A and B), at least one of which is wider than a width associated with the multiplication circuit. Each of the operands includes contiguous ordered word-wide operand segments (A | 02-12-2009 |
20090292758 | Optimized Corner Turns for Local Storage and Bandwidth Reduction - A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. By reversing the visitation order, the mechanism eliminates a block load at the corner turns. In accordance with the illustrative embodiment, a corner return is referred to as a “bounce” corner turn and results in a serpentine patterned processing order of the matrix blocks. The mechanism allows the data processing system to perform a block matrix multiplication operation with a maximum of three block transfers per time step. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers. | 11-26-2009 |
20090300091 | Reducing Bandwidth Requirements for Matrix Multiplication - A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. The mechanism increases block size and divides each block into sub-blocks. By reversing the visitation order, the mechanism eliminates a sub-block load at the corner turns. The mechanism per forms sub-block matrix multiplication for each sub-block in a given block, and then repeats operation for a next block until all blocks are computed. The mechanism may determine block size and sub-block size to optimize load balancing and memory bandwidth. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers. | 12-03-2009 |
20100023575 | PREDICTOR - A Predictor is described which is based on a modified RLS (recursive least squares) algorithm. The modifications prevent divergence and accuracy problems when fixed point implementation is used. | 01-28-2010 |
20100325187 | EFFICIENT MATRIX MULTIPLICATION ON A PARALLEL PROCESSING DEVICE - The present invention enables efficient matrix multiplication operations on parallel processing devices. One embodiment is a method for mapping CTAs to result matrix tiles for matrix multiplication operations. Another embodiment is a second method for mapping CTAs to result tiles. Yet other embodiments are methods for mapping the individual threads of a CTA to the elements of a tile for result tile computations, source tile copy operations, and source tile copy and transpose operations. The present invention advantageously enables result matrix elements to be computed on a tile-by-tile basis using multiple CTAs executing concurrently on different streaming multiprocessors, enables source tiles to be copied to local memory to reduce the number accesses from the global memory when computing a result tile, and enables coalesced read operations from the global memory as well as write operations to the local memory without bank conflicts. | 12-23-2010 |
20110040821 | Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture - Mechanisms for performing matrix multiplication operations with data pre-conditioning in a high performance computing architecture are provided. A vector load operation is performed to load a first vector operand of the matrix multiplication operation to a first target vector register. A load and splat operation is performed to load an element of a second vector operand and replicating the element to each of a plurality of elements of a second target vector register. A multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the matrix multiplication operation. The partial product of the matrix multiplication operation is accumulated with other partial products of the matrix multiplication operation. | 02-17-2011 |
20110040822 | Complex Matrix Multiplication Operations with Data Pre-Conditioning in a High Performance Computing Architecture - Mechanisms for performing a complex matrix multiplication operation are provided. A vector load operation is performed to load a first vector operand of the complex matrix multiplication operation to a first target vector register. The first vector operand comprises a real and imaginary part of a first complex vector value. A complex load and splat operation is performed to load a second complex vector value of a second vector operand and replicate the second complex vector value within a second target vector register. The second complex vector value has a real and imaginary part. A cross multiply add operation is performed on elements of the first target vector register and elements of the second target vector register to generate a partial product of the complex matrix multiplication operation. The partial product is accumulated with other partial products and a resulting accumulated partial product is stored in a result vector register. | 02-17-2011 |
20110078226 | Sparse Matrix-Vector Multiplication on Graphics Processor Units - Techniques for optimizing sparse matrix-vector multiplication (SpMV) on a graphics processing unit (GPU) are provided. The techniques include receiving a sparse matrix-vector multiplication, analyzing the sparse matrix-vector multiplication to identify one or more optimizations, wherein analyzing the sparse matrix-vector multiplication to identify one or more optimizations comprises analyzing a non-zero pattern for one or more optimizations and determining whether the sparse matrix-vector multiplication is to be reused across computation, optimizing the sparse matrix-vector multiplication, wherein optimizing the sparse matrix-vector multiplication comprises optimizing global memory access, optimizing shared memory access and exploiting reuse and parallelism, and outputting an optimized sparse matrix-vector multiplication. | 03-31-2011 |
20120136912 | APPARATUS AND METHOD FOR GENERATING CODEBOOK IN WIRELESS COMMUNICATION SYSTEM - An apparatus and method for generating a codebook in a wireless communication system are disclosed. The codebook generation method includes determining one or more dominant singular vectors in a channel matrix for antennas and setting each of the dominant singular vectors as a random non-zero vector, generating a first codebook having codewords, a minimum distance between the code-words being maximized, using the random non-zero vector in a region that includes unit norm vectors each having a Euclidean distance to each of the dominant singular vectors, equal to or less than a predetermined value, generating a second codebook corresponding to a unitary matrix that rotates the random non-zero vector toward the dominant singular vectors, and generating a final codebook using the first and second codebooks. | 05-31-2012 |
20120203815 | MATRIX CALCULATION METHOD, PROGRAM, AND SYSTEM - A matrix calculation method and system for calculating funny matrix multiplication (FMM) of a matrix A and a matrix B, including: sequentially calculating a permutation of indices {ai} in which values are arranged in a non-decreasing order with respect to each i-th row where i=1 to the number of rows of the matrix A; storing a value, which is greater than expected as a value of a matrix, for C[i, j] with respect to each j-th column where j=1 to the number of columns of the matrix A in the i-th row; sequentially calculating a permutation of indices {bj} in which values are arranged in a non-decreasing order with respect to each j-th column where j=1 to the number of columns of the matrix B; and setting the values of C[i, j], which are i and j components of the matrix C. | 08-09-2012 |
20120203816 | Optimized Corner Turns for Local Storage and Bandwidth Reduction - A block matrix multiplication mechanism is provided for reversing the visitation order of blocks at corner turns when performing a block matrix multiplication operation in a data processing system. By reversing the visitation order, the mechanism eliminates a block load at the corner turns. In accordance with the illustrative embodiment, a corner return is referred to as a “bounce” corner turn and results in a serpentine patterned processing order of the matrix blocks. The mechanism allows the data processing system to perform a block matrix multiplication operation with a maximum of three block transfers per time step. Therefore, the mechanism reduces maximum throughput and increases performance. In addition, the mechanism also reduces the number of multi-buffered local store buffers. | 08-09-2012 |
20120215826 | SYSTEM AND METHOD TO IMPLEMENT A MATRIX MULTIPLY UNIT OF A BROADBAND PROCESSOR - The present invention provides a system and method for improving the performance of general-purpose processors by implementing a functional unit that computes the product of a matrix operand with a vector operand, producing a vector result. The functional unit fully utilizes the entire resources of a 128b by 128b multiplier regardless of the operand size, as the number of elements of the matrix and vector operands increase as operand size is reduced. The unit performs both fixed-point and floating-point multiplications and additions with the highest-possible intermediate accuracy with modest resources. | 08-23-2012 |
20120278376 | SYSTEM AND METHOD FOR SPARSE MATRIX VECTOR MULTIPLICATION PROCESSING - Systems and methods for sparse matrix vector multiplication (SpMV) are disclosed. The systems and methods include a novel streaming reduction architecture for floating point accumulation and a novel on-chip cache design optimized for streaming compressed sparse row (CSR) matrices. The present disclosure is also directed to implementation of the reduction circuit and/or processing elements for SpMV processing into a personality for the Convey HC-1 computing device. | 11-01-2012 |
20120317160 | MATRIX CALCULATION METHOD, PROGRAM, AND SYSTEM - A matrix calculation system for calculating funny matrix multiplication (FMM) of a matrix A and a matrix B, including: sequentially calculating a permutation of indices {ai} in which values are arranged in a non-decreasing order with respect to each i-th row where i=1 to the number of rows of the matrix A; storing a value, which is greater than expected as a value of a matrix, for C[i, j] with respect to each j-th column where j=1 to the number of columns of the matrix A in the i-th row; sequentially calculating a permutation of indices {bj} in which values are arranged in a non-decreasing order with respect to each j-th column where j=1 to the number of columns of the matrix B; and setting the values of C[i, j], which are i and j components of the matrix C. | 12-13-2012 |
20130262550 | MATRIX CALCULATION DEVICE, MATRIX CALCULATION METHOD, AND STORAGE MEDIUM HAVING MATRIX CALCULATION PROGRAM STORED THEREON - A matrix calculation device includes a first partition position display unit configured to distinguishably display a partition position of the one matrix partitioned by the matrix partitioning unit, a partition position determination unit configured to determine, based on a partition position of the one matrix distinguishably displayed by the first partition position display unit and a definition of a product of matrices, a partition position of the other matrix, and a second partition position display unit configured to distinguishably display the partition position of the other matrix determined by the partition position determination unit. | 10-03-2013 |
20140032625 | Floating point matrix multiplication co-processor - Invention providing a means for performing matrix multiplication that may be implemented in hardware or software. The invention is scalable to matrices of varying dimension and to permit balancing circuit complexity versus processing throughput. | 01-30-2014 |
20140046995 | PARALLEL IMPLEMENTATION OF MAXIMUM A POSTERIORI PROBABILITY DECODER - A MAP decoder may be implemented in parallel. In one implementation, a device may receive an input array that represents received encoded data and calculate, in parallel, a series of transition matrices from the input array. The device may further calculate, in parallel, products of the cumulative products of the series of transition matrices and an initialization vector. The device may further calculate, in parallel and based on the products of the cumulative products of the series of transition matrices and the initialization vector, an output array that corresponds to a decoded version of the received encoded data in the input array. | 02-13-2014 |
20140108481 | UNIVERSAL FPGA/ASIC MATRIX-VECTOR MULTIPLICATION ARCHITECTURE - A universal single-bitstream FPGA library or ASIC implementation accelerates matrix-vector multiplication processing multiple matrix encodings including dense and multiple sparse formats. A hardware-optimized sparse matrix representation referred to herein as the Compressed Variable-Length Bit Vector (CVBV) format is used to take advantage of the capabilities of FPGAs and reduce storage and bandwidth requirements across the matrices compared to that typically achieved when using the Compressed Sparse Row (CSR) format in typical CPU- and GPU-based approaches. Also disclosed is a class of sparse matrix formats that are better suited for FPGA implementations than existing formats reducing storage and bandwidth requirements. A partitioned CVBV format is described to enable parallel decoding. | 04-17-2014 |
20140172937 | APPARATUS FOR PERFORMING MATRIX VECTOR MULTIPLICATION APPROXIMATION USING CROSSBAR ARRAYS OF RESISTIVE MEMORY DEVICES - An apparatus that performs the mathematical matrix-vector multiplication approximation operations using crossbar arrays of resistive memory devices (e.g. memristor, resistive random-access memory, spintronics, etc.). A crossbar array formed by resistive memory devices serves as a memory array that stores the coefficients of a matrix. Combined with input and output analog circuits, the crossbar array system realizes the method of performing matrix-vector multiplication approximation operations with significant performance, area and energy advantages over existing methods and designs. This invention also includes an extended method that realizes the auto-associative neural network recall function using the resistive memory crossbar architecture. | 06-19-2014 |
20140181171 | METHOD AND SYSTEM FOR FAST TENSOR-VECTOR MULTIPLICATION - A method and a system for fast tensor-vector multiplication provide factoring an original tensor into a kernel and a commutator, multiplying the kernel obtained by the factoring of the original tensor, by the vector and thereby obtaining a matrix, and summating elements and sums of elements of the matrix as defined by the commutator obtained by the factoring of the original tensor, and thereby obtaining a resulting tensor which corresponds to a product of the original tensor and the vector. | 06-26-2014 |
20140280428 | INFORMATION RETRIEVAL USING SPARSE MATRIX SKETCHING - A system for retrieving stored data includes memory and a processor. The memory stores a first matrix, A, having dimensions n×d, a first sparse matrix, R, and a second sparse matrix, S. The processor receives an input value, k, corresponding to a selected rank to generate a second matrix, RA, by multiplying the first matrix, A, by the first sparse matrix, R. The second matrix, RA, has dimensions n×t. The processor generates a third matrix, AS | 09-18-2014 |
20140289301 | MATRIX CALCULATION APPARATUS, MATRIX CALCULATION METHOD, AND COMPUTER READABLE MEDIUM HAVING MATRIX CALCULATION PROCESS PROGRAM STORED THEREON - There is provided a matrix calculation apparatus. The apparatus includes: a matrix calculation formula display controller configured to display a matrix calculation formula on a display unit, wherein the matrix calculation formula comprises a first matrix; a matrix display controller configured to display a second matrix on the display unit; a submatrix receiver configured to input the second matrix into a certain element of the first matrix as a submatrix of the first matrix in response to a user operation; and a matrix size change display controller configured to change a size of the first matrix in accordance with a size of the second matrix and the certain element of the first matrix into which the second matrix is input and then display the matrix calculation formula. | 09-25-2014 |
20150074163 | PRODUCT-SUM OPERATION CIRCUIT AND PRODUCT-SUM OPERATION SYSTEM - A product-sum operation circuit that performs a matrix product of a first-matrix and a second-matrix to output a third-matrix, includes; a plurality of multipliers; a plurality of first-adders each of which is configured to add two multiplication results of the plurality of multipliers; a plurality of second-adders each of which is configured to add two addition results of the plurality of the first-adders; an input selector configured to output an element of the first-matrix and an element of the second-matrix to input terminals of the plurality of multipliers according to the number of rows and the number of columns of the first-matrix and the second-matrix; and an output selector configured to select and output the addition results of each of the plurality of first-adders or each of the plurality of second-adders according to the number of rows and the number of columns of the first-matrix and the second-matrix, as the third-matrix. | 03-12-2015 |
20150088954 | System and Method for Sparse Matrix Vector Multiplication Processing - Systems and methods for sparse matrix vector multiplication (SpMV) are disclosed. The systems and methods include a novel streaming reduction architecture for floating point accumulation and a novel on-chip cache design optimized for streaming compressed sparse row (CSR) matrices. The present disclosure is also directed to implementation of the reduction circuit and/or processing elements for SpMV processing into a personality for the Convey HC-1 computing device. | 03-26-2015 |
20150356056 | METHODS AND SYSTEMS FOR CALCULATING JOINT STATISTICAL INFORMATION - Computer-implemented methods and systems are provided for calculating statistical information. A computing system may be configured to call a linear algebra subroutine adapted to efficiently perform matrix multiplication, providing as arguments a first matrix and a second matrix, consistent with disclosed embodiments. The first matrix may include first elements corresponding to binned values of first measurements associated with a first observation. The second matrix may include second elements corresponding to binned values of second measurements associated with a set of second observations. The computing system may be configured to receive a joint value matrix estimating the joint probabilities for the binned measurements from the linear algebra subroutine. The computing system may determine a structure of the set of second observations based on the joint value matrix. In certain aspects, the computing system may determine the mutual information between the first observation and the set of second observations. | 12-10-2015 |
20150378962 | Approach For More Efficient Use Of Computing Resources While Calculating Cross Product Or Its Approximation For Logistic Regression On Big Data Sets - According to one technique, a modeling computer computes a Hessian matrix by determining whether an input matrix contains more than a threshold number of dense columns. If so, the modeling computer computes a sparsified version of the input matrix and uses the sparsified matrix to compute the Hessian. Otherwise, the modeling computer identifies which columns are dense and which columns are sparse. The modeling computer then partitions the input matrix by column density and uses sparse matrix format to store the sparse columns and dense matrix format to store the dense columns. The modeling computer then computes component parts which combine to form the Hessian, wherein component parts that rely on dense columns are computed using dense matrix multiplication and component parts that rely on sparse columns are computed using sparse matrix multiplication. | 12-31-2015 |
20190147015 | APPARATUS AND METHODS FOR MATRIX ADDITION AND SUBTRACTION | 05-16-2019 |
20220138281 | ASSIGNING PROCESSING THREADS FOR MATRIX-MATRIX MULTIPLICATION - An apparatus includes a processor and a memory to store instructions. The instructions, when executed by the processor, cause the processor to perform threading of a first matrix along a first dimension of the first matrix and a second dimension of the matrix. The threading represents block sizes of the first matrix to assign to process threads of a multiplication algorithm to determine a third matrix that represents a product of the first matrix and a second matrix. The block sizes include a first block size along the first dimension and a second block size along the second dimension. The second matrix shares the second dimension with the first matrix. The instructions, when executed by the processor, cause the processor to provide data to the multiplication algorithm, which represents the first block size and the second block size. | 05-05-2022 |