Class / Patent application number | Description | Number of patent applications / Date published |
708523000 | Multiplication followed by addition (i.e., x*y+z) | 35 |
20080256162 | X87 FUSED MULTIPLY-ADD INSTRUCTION - An x87 fused multiply-add (FMA) instruction in the instruction set of an x86 architecture microprocessor is disclosed. The FMA instruction implicitly specifies the two factor operands as the top two operands of the x87 FPU register stack and explicitly specifies the third addend operand as a third x87 FPU register stack register. The microprocessor multiplies the first two operands and adds the product to the third operand to generate a result. The result is stored into the third register and the first two operands are popped off the stack. In an alternate embodiment, the third operand is also implicitly specified as being stored in the register that is two registers below the top of stack register; the result is also stored therein. The instruction opcode value is in the x87 opcode range. | 10-16-2008 |
20090030963 | MULTIPLICATION CIRCUIT, DIGITAL FILTER, SIGNAL PROCESSING DEVICE, SYNTHESIS DEVICE, SYNTHESIS PROGRAM, AND SYNTHESIS PROGRAM RECORDING MEDIUM - The conventional two's complement multiplier which is constituted by a Booth encoder, a partial production generation circuit, and an adder has a problem that the circuit scale would be increased because a bit extension is performed when the multiplier is adapted to an unsigned multiplication. | 01-29-2009 |
20090077154 | Microprocessor - Provided is a microprocessor including a complex-MAC unit that operates in response to a complex-MAC instruction. The complex-MAC unit receives first and second complex data (each having 2 | 03-19-2009 |
20090150471 | RECONFIGURABLE ARITHMETIC UNIT AND HIGH-EFFICIENCY PROCESSOR HAVING THE SAME - Provided are a reconfigurable arithmetic unit and a processor having the same. The reconfigurable arithmetic unit can perform an addition operation or a multiplication operation according to an instruction by sharing an adder. The reconfigurable arithmetic unit includes a booth encoder for encoding a multiplier, a partial product generator for generating a plurality of partial products using the encoded multiplier and a multiplicand, a Wallace tree circuit for compressing the partial products into a first partial product and a second partial product, a first Multiplexer (MUX) for selecting and outputting one of the first partial product and a first addition input according to a selection signal, a second MUX for selecting and outputting one of the second partial product and a second addition input according to the selection signal, and a Carry Propagation Adder (CPA) for adding an output of the first MUX and an output of the second MUX to output an operation result. The arithmetic unit can operate as an adder or a multiplier according to an instruction, and thus can increase the degree of use of entire hardware. | 06-11-2009 |
20090248779 | Processor which Implements Fused and Unfused Multiply-Add Instructions in a Pipelined Manner - Implementing an unfused multiply-add instruction within a fused multiply-add pipeline. The system may include an aligner having an input for receiving an addition term, a multiplier tree having two inputs for receiving a first value and a second value for multiplication, and a first carry save adder (CSA), wherein the first CSA may receive partial products from the multiplier tree and an aligned addition term from the aligner. The system may include a fused/unfused multiply add (FUMA) block which may receive the first partial product, the second partial product, and the aligned addition term, wherein the first partial product and the second partial product are not truncated. The FUMA block may perform an unfused multiply add operation or a fused multiply add operation using the first partial product, the second partial product, and the aligned addition term, e.g., depending on an opcode or mode bit. | 10-01-2009 |
20090287757 | Leading Zero Estimation Modification for Unfused Rounding Catastrophic Cancellation - Modifying a leading zero estimation during an unfused multiply add operation of (A*B)+C. A plurality of terms x and y may be received, and each may be based on truncated terms s and t (e.g., in performing the unfused multiply add operation) and the shifted C term. A first leading zero estimation may be calculated based on the terms x and y. It may be determined if near total catastrophic cancellation has occurred. A carry in from a right most number of bits of the terms s and t and the most significant truncated bits of s and t may be used to generate a second leading zero estimation based on the first leading zero estimation if the near total catastrophic cancellation has occurred. | 11-19-2009 |
20090292756 | Large-factor multiplication in an array of processors - A processor to calculate a product-component having fewer digits than an entire product of a multiplication of a multiplicand and a multiplier. A memory holds at least one multiplicand-component having fewer digits than the multiplicand and at least one multiplier-component having fewer digits than the multiplier. A logic then calculates the product-component based on the multiplicand-components and the multiplier-components in the memory. Collectively, a plurality of the processors can calculate all of the product-components of the product. | 11-26-2009 |
20100169404 | FLEXIBLE ACCUMULATOR IN DIGITAL SIGNAL PROCESSING CIRCUITRY - A multiplier-accumulator (MAC) block can be programmed to operate in one or more modes. When the MAC block implements at least one multiply-and-accumulate operation, the accumulator value can be zeroed without introducing clock latency or initialized in one clock cycle. To zero the accumulator value, the most significant bits (MSBs) of data representing zero can be input to the MAC block and sent directly to the add-subtract-accumulate unit. Alternatively, dedicated configuration bits can be set to clear the contents of a pipeline register for input to the add-subtract-accumulate unit. | 07-01-2010 |
20100306301 | ARITHMETIC PROCESSING UNIT THAT PERFORMS MULTIPLY AND MULTIPLY-ADD OPERATIONS WITH SATURATION AND METHOD THEREFOR - Sum and carry signals are formed representing a product of a first and a second operand. A bias signal is formed having a value determined by a sign of a product of the first and the second operand. An output signal is provided based on an addition of the sum signal, the carry signal, a sign-extended addend, and the bias signal. A portion of the output signal, a saturated minimum value, or a saturated maximum value, is selected as a final result based on the sign of the product and a sign of the output signal. | 12-02-2010 |
20110029589 | LOW POWER FIR FILTER IN MULTI-MAC ARCHITECTURE - Embodiments of the invention are directed to system and method that enable relatively low power dissipation by scheduling operations of multiply accumulators chain of two or more multiply accumulators units by delivering an output result of a first multiply accumulator of the chain as an input to a second subsequent multiply accumulator of the chain. | 02-03-2011 |
20110055308 | Method And System For Multi-Precision Computation - Systems and methods for multi-precision computation are disclosed. One embodiment of the present invention includes a plurality of multiply-add units (MADDs) configured to perform one or more single precision operations and an arrangement generator to generate one or more mantissa arrangements using a plurality of double precision numbers. Each MADD is configured to receive and load said mantissa arrangements from the arrangement generator. The MADDs compute a result of a multi-precision computation using the mantissa arrangements. In an embodiment, the MADDs are configured to simultaneously perform operations that include, single precision operations, double-precision additions and double-precision multiply and additions. | 03-03-2011 |
20110153707 | MULTIPLYING AND ADDING MATRICES - An apparatus and method are described for multiplying and adding matrices. For example, one embodiment of a method comprises decoding by a decoder in a processor device, a single instruction specifying an m-by-m matrix operation for a set of vectors, wherein each vector represents an m-by-m matrix of data elements and m is greater than one; issuing the single instruction for execution by an execution unit in the processor device; and responsive to the execution of the single instruction, generating a resultant vector, wherein the resultant vector represents an m-by-m matrix of data elements. | 06-23-2011 |
20120011187 | PARALLEL REDUNDANT DECIMAL FUSED-MULTIPLY-ADD CIRCUIT - A circuit for performing a floating-point fused-multiply-add (FMA) calculation of a×b±c. The circuit includes (i) a partial product generation module having (a) a multiples generator unit configured to generate multiples of a multiplicand has m digit binary coded decimal (BCD) format, (b) a recoding unit configured to generate n+1 signed digits (SD) sets from a sum vector and a carry vector of a multiplier, and (c) a multiples selection unit configured to generate partial product vectors from the multiples of the multiplicand based on the n+1 SD sets and the sign of FMA calculation, and (ii) a carry save adder (CSA) tree configured to add the partial product vectors and an addend to generate a result sum vector and a result carry vector in a m+n digit BCD format. | 01-12-2012 |
20130198254 | PROCESSOR FOR PERFORMING MULTIPLY-ADD OPERATIONS ON PACKED DATA - A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data. | 08-01-2013 |
20130226982 | APPARATUS AND METHOD FOR GENERATING PARTIAL PRODUCT FOR POLYNOMIAL OPERATION - An apparatus and a method for generating a partial product for a polynomial operation are provided. The apparatus includes first encoders, each of the first encoders configured to selectively generate and output one of mutually exclusive values based on two inputs. The apparatus further includes a second encoder configured to generate and output two candidate partial products based on an output from a first one of the first encoders that is provided at a reference bit position of the inputs, an output from a second one of the first encoders that is provided at an upper bit position of the inputs, and a multiplicand. The apparatus further includes a multiplexer configured to select one of the candidate partial products output from the second encoder. | 08-29-2013 |
20130262549 | ARITHMETIC CIRCUIT AND ARITHMETIC METHOD - An arithmetic circuit includes a circuit to output n-th multiples of a multiplicand, a circuit to output an XOR operation result that is a result of performing an exclusive logical sum operation between the multiplicand and a result of shifting the multiplicand to left by one bit, a circuit to output a first selection signal in response to a first portion of a multiplier, a circuit to output a second selection signal in response to a second portion of the multiplier, a circuit to select, in response to the first selection signal, one of the n-th multiples of the multiplicand and the XOR operation result, a circuit to select, in response to the second selection signal, one of the n-th multiples of the multiplicand and the XOR operation result, and a circuit to output a result of adding up the first partial product and the second partial product. | 10-03-2013 |
20130346461 | APPARATUS FOR CALCULATING A RESULT OF A SCALAR MULTIPLICATION - An apparatus for calculating a result of a scalar multiplication of a reference number with a reference point on an elliptic curve comprises a point selector and a processor. The point selector is configured to select randomly or pseudo-randomly an auxiliary point on the elliptic curve. The processor is configured to calculate the result of the scalar multiplication with a double-and-always-add process using the auxiliary point. | 12-26-2013 |
20130346462 | INTERCONNECTED ARITHMETIC LOGIC UNITS - An arithmetic logic stage in a graphics pipeline includes a number of arithmetic logic units (ALUs). The ALUs each include, for example, a multiplier and an adder. The ALUs are interconnected by circuitry that, for example, routes the output from the multiplier in one ALU to both the adder in that ALU and an adder in another ALU. | 12-26-2013 |
20140095570 | APPARATUS AND METHOD FOR CALCULATING INTERNAL STATE FOR ARTIFICIAL EMOTION - An apparatus and method for calculating an internal state for artificial emotions are disclosed, of which the method comprises multiplying an input value obtained from a sensor with a first personality set in accordance with at least one low rank element contained in at least one high rank element of a NEO PI-R (Revised NEO Personality Inventory); calculating a personality factor value in a Five-Factor Model of the personality by adding the results of the multiplication; and calculating the internal state by multiplying the personality factor value with a second personality. | 04-03-2014 |
20140101220 | COMPOSITE FINITE FIELD MULTIPLIER - A composite finite field multiplier is disclosed. The multiplier includes a controller, an input port, an output port, a GF((2 | 04-10-2014 |
20140122554 | Reducing Power Consumption In A Fused Multiply-Add (FMA) Unit Responsive To Input Data Values - In an embodiment, a fused multiply-add (FMA) circuit is configured to receive a plurality of input data values to perform an FMA instruction on the input data values. The circuit includes a multiplier unit and an adder unit coupled to an output of the multiplier unit, and a control logic to receive the input data values and to reduce switching activity and thus reduce power consumption of one or more components of the circuit based on a value of one or more of the input data values. Other embodiments are described and claimed. | 05-01-2014 |
20140122555 | REDUCING POWER CONSUMPTION IN A FUSED MULTIPLY-ADD (FMA) UNIT RESPONSIVE TO INPUT DATA VALUES - In an embodiment, a fused multiply-add (FMA) circuit is configured to receive a plurality of input data values to perform an FMA instruction on the input data values. The circuit includes a multiplier unit and an adder unit coupled to an output of the multiplier unit, and a control logic to receive the input data values and to reduce switching activity and thus reduce power consumption of one or more components of the circuit based on a value of one or more of the input data values. Other embodiments are described and claimed. | 05-01-2014 |
20140164467 | APPARATUS AND METHOD FOR VECTOR INSTRUCTIONS FOR LARGE INTEGER ARITHMETIC - An apparatus is described that includes a semiconductor chip having an instruction execution pipeline having one or more execution units with respective logic circuitry to: a) execute a first instruction that multiplies a first input operand and a second input operand and presents a lower portion of the result, where, the first and second input operands are respective elements of first and second input vectors; b) execute a second instruction that multiplies a first input operand and a second input operand and presents an upper portion of the result, where, the first and second input operands are respective elements of first and second input vectors; and, c) execute an add instruction where a carry term of the add instruction's adding is recorded in a mask register. | 06-12-2014 |
20140222883 | MATH CIRCUIT FOR ESTIMATING A TRANSCENDENTAL FUNCTION - A math circuit for computing an estimate of a transcendental function is described. A lookup table storage circuit has stored therein several groups of binary values, where each group of values represents a respective coefficient of a first polynomial that estimates the function to a high precision. A computing circuit uses a portion of a binary value, that is also taken from one of the groups of values, to evaluate a second polynomial that estimates the function to a low precision. Other embodiments are also described and claimed. | 08-07-2014 |
20140280427 | METHOD AND SYSTEM FOR DECOMPOSING SINGLE-QUBIT QUANTUM CIRCUITS INTO A DISCRETE BASIS - A target quantum circuit expressed in a first quantum gate basis may be transformed into a corresponding quantum circuit expressed in a second quantum gate basis, which may be a universal set of gates such as a V gate basis set. The target quantum circuit may be expressed as a linear combination of quantum gates. The linear combination of quantum gates may be mapped to a quaternion. The quaternion may be factorized, based at least in part on an amount of precision between the target quantum circuit and the corresponding quantum circuit expressed in the second quantum gate basis, into a sequence of quaternion factors. The sequence of quaternion factors may be mapped into a sequence of quantum gates of the second quantum gate basis, where the sequence of sequence of quantum gates is the corresponding quantum circuit. | 09-18-2014 |
20140289300 | PROCESSOR AND PROCESSING METHOD - In a processor that includes a plurality of multipliers and a plurality of adders to execute matrix product processing, each data of input vector data involved in the arithmetic processing is used in two multipliers, whereby arithmetic processing of elements in different rows and different columns in a matrix product operation is executed with a single instruction, that enables the sharing of input data to reduce the number of times data are moved in the whole matrix product processing and reduce power consumption. | 09-25-2014 |
20140365548 | VECTOR MATRIX PRODUCT ACCELERATOR FOR MICROPROCESSOR INTEGRATION - In at least one example embodiment, a microprocessor circuit is provided that includes a microprocessor core coupled to a data memory via a data memory bus comprising a predetermined integer number of data wires (J); the single-ported data memory configured for storage of vector input elements of an N element vector in a predetermined vector element order and storage of matrix input elements of an M×N matrix comprising M columns of matrix input elements and N rows of matrix input elements; a vector matrix product accelerator comprising a datapath configured for multiplying the N element vector and the matrix to compute an M element result vector, the vector matrix product accelerator comprising: an input/output port interfacing the data memory bus to the vector matrix product accelerator; a plurality of vector input registers for storage respective input vector elements received through the input/output port. | 12-11-2014 |
20140379774 | SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR PERFORMING MATHEMATICAL OPERATIONS - The system has first, second, third, and fourth subsystems. Each subsystem has first and second multipliers coupled, respectively, to first and second adders. Each multiplier has two inputs. The first adder is coupled to a first output, a first accumulator, and a bit shifter. The bit shifter is coupled to a third adder. The third adder is coupled to a multiplexer. The multiplexer is coupled to a second output and a second accumulator. The second adder is coupled to the third adder and the multiplexer. The first outputs of the first and second subsystems are coupled directly to a fourth adder, the second outputs of the first and second subsystems are coupled directly to a fifth adder, the first outputs of the third and fourth subsystems are coupled directly to a sixth adder, and the second outputs of the third and fourth subsystems are coupled directly to a seventh adder. | 12-25-2014 |
20150058391 | PROCESSOR WITH EFFICIENT ARITHMETIC UNITS - A processor includes a carry save array multiplier. The carry save array multiplier includes an array of cascaded partial product generators. The array of cascaded partial product generators is configured to generate an output value as a product of two operands presented at inputs of the multiplier. The array of cascaded partial product generators is also configured to generate an output value as a sum of two operands presented at inputs of the multiplier. | 02-26-2015 |
20150081753 | TECHNIQUE FOR PERFORMING ARBITRARY WIDTH INTEGER ARITHMETIC OPERATIONS USING FIXED WIDTH ELEMENTS - One embodiment of the present invention includes a method for performing arithmetic operations on arbitrary width integers using fixed width elements. The method includes receiving a plurality of input operands, segmenting each input operand into multiple sectors, performing a plurality of multiply-add operations based on the multiple sectors to generate a plurality of multiply-add operation results, and combining the multiply-add operation results to generate a final result. One advantage of the disclosed embodiments is that, by using a common fused floating point multiply-add unit to perform arithmetic operations on integers of arbitrary width, the method avoids the area and power penalty of having additional dedicated integer units. | 03-19-2015 |
20150095394 | MATH PROCESSING BY DETECTION OF ELEMENTARY VALUED OPERANDS - One embodiment of the present invention includes a method for simplifying arithmetic operations by detecting operands with elementary values such as zero or 1.0. Computer and graphics processing systems perform a great number of multiply-add operations. In a significant portion of these operations, the values of one or more of the operands are zero or 1.0. By detecting the occurrence of these elementary values, math operations can be greatly simplified, for example by eliminating multiply operations when one multiplicand is zero or 1.0 or eliminating add operations when one addend is zero. The simplified math operations resulting from detecting elementary valued operands provide significant savings in overhead power, dynamic processing power, and cycle time. | 04-02-2015 |
20150095395 | PROCESSING DEVICE AND METHOD FOR MULTIPLYING POLYNOMIALS - According to one embodiment, a processing device for multiplying a first polynomial with a second polynomial is described including a first memory storing a representation of the first polynomial, a controller configured to separate the first polynomial into parts, a second memory storing pre-determined results of the multiplications of the second polynomial with possible forms of the parts of the first polynomial, a third memory for storing the result of the multiplication, an address logic, configured to determine, for each part of the first polynomial, a start address of a memory block of the second memory based on the form of the part and the location of the part within the first polynomial and an adder configured to add, for each determined address of the memory block of the second memory, the content of the memory block of the second memory at least partially to the contents of the third memory, wherein the data element of the third memory to which the content of a data element of the memory block of the second memory is added is the same for a plurality of the parts of the first polynomial. | 04-02-2015 |
20160004504 | NON-ATOMIC SPLIT-PATH FUSED MULTIPLY-ACCUMULATE - A microprocessor performs a fused multiply-accumulate operation of a form ±A*B±C using first and second execution units. An input operand analyzer circuit determines whether values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with partial products of A and B. The first instruction execution unit multiplies A and B and jointly accumulates C to partial products of A and B when the values of A, B and/or C meet a sufficient condition to perform a joint accumulation of C with the partial products of A and B. The second instruction execution unit separately accumulates C to the products of A and B when the values of A, B and/or C do not meet a sufficient condition to perform a joint accumulation of C with the partial products of A and B. | 01-07-2016 |
20160004505 | TEMPORALLY SPLIT FUSED MULTIPLY-ACCUMULATE OPERATION - A microprocessor splits a fused multiply-accumulate operation of the form A*B+C into first and second multiply-accumulate sub-operations to be performed by a multiplier and an adder. The first sub-operation at least multiplies A and B, and conditionally also accumulates C to the partial products of A and B to generate an unrounded nonredundant sum. The unrounded nonredundant sum is stored in memory shared by the multiplier and adder for an indefinite time period, enabling the multiplier and adder to perform other operations unrelated to the multiply-accumulate operation. The second sub-operation conditionally accumulates C to the unrounded nonredundant sum if C is not already incorporated into the value, and then generates a final rounded result. | 01-07-2016 |
20160004507 | SPLIT-PATH HEURISTIC FOR PERFORMING A FUSED FMA OPERATION - A microprocessor performs a fused multiply-accumulate operation of a form ±A*B±C. An evaluation is made to detect whether values of A, B, and/or C meet a sufficient condition for performing a joint accumulation of C with partial products of A and B. If so, a joint accumulation of C is done with partial products of A and B and result of the joint accumulation is rounded. If not, then a primary accumulation is done of the partial products of A and B. This generates an unrounded non-redundant result of the primary accumulation. The unrounded result is then truncated to generate an unrounded non-redundant intermediate result vector that excludes one or more least significant bits of the unrounded non-redundant result. A secondary accumulation is then performed, adding or subtracting C to the unrounded non-redundant intermediate result vector. Finally, the result of the secondary accumulation is rounded. | 01-07-2016 |