Class / Patent application number | Description | Number of patent applications / Date published |
708501000 | Multiplication followed by addition | 38 |
20080256161 | Bridge Fused Multiply-Adder Circuit - A bridge fused multiply-adder is disclosed. The fused multiply-adder is for the single instruction execution of (A×B)+C. The bridge fused multiply-add unit adds this functionality to existing floating-point co-processor units by including a fused multiply-add hardware “bridge” between an existing floating-point adder and a floating-point multiplier unit. This fused multiply-add functionality is added to existing two-operand architecture designs without degrading the performance or parallel pipe execution of floating-point adder and floating-point multiplier instructions. | 10-16-2008 |
20080275931 | Executing Fixed Point Divide Operations Using a Floating Point Multiply-Add Pipeline - A system and method for executing fixed point divide operations using a floating point multiply-add pipeline are provided. With the system and method, the floating point execution unit in a processor is modified to include elements that may be used to perform fixed point divide operations. These additional elements include a leading zero counter, a leading one counter, an estimate table unit, and a state machine. The fixed point divide operands are converted to a floating point format and an estimate of the reciprocal of the divisor is generated using estimate tables. These values are used in multiple passes through the floating point unit for calculating estimates of the quotient and corresponding error values. The estimates of the quotient are based on previous estimates of the quotient in a prior pass through the floating point unit and a corresponding error value. The final quotient estimate is truncated. | 11-06-2008 |
20090077152 | Handling Denormal Floating Point Operands When Result Must be Normalized - A system for handling denormal floating point operands when the result must be normalized. A leading zero counter (lzc) on the operand B (opB) is used to limit alignment shifts when opB is denormal but is much greater than the product of operands A and C, i.e. AC. By limiting the additional shift of B during normalization, by the number of leading zeros in opB, no increase is needed in the output bus of the alignment shifter. Furthermore, the additional shift may be done either in the alignment shifter, or postponed to a later stage in the pipeline, where the result is normalized. | 03-19-2009 |
20090265409 | PROCESSOR FOR PERFORMING MULTIPLY-ADD OPERATIONS ON PACKED DATA - A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data. | 10-22-2009 |
20100057824 | METHOD AND SYSTEM FOR PROCESSING THE BOOTH ENCODING 33RD TERM - A computer system for computing a binary operation involving a first term multiplied by a second term resulting in a product, where the product is conditionally added to a third term in a central processing unit. The central processing unit includes a carry save adder configured to add a plurality of partial products obtained from the product of the first term and the second term to obtain a first partial result and a second partial result, a multiplexer configured to output one selected from the group consisting of the second term, the third term, and zero, and an alignment shifter configured to shift an output of the multiplexer to align the output of the multiplexer with the first partial result and the second partial result to obtain a shifted term. The shifted term, the first partial result and the second partial result are added together to obtain a result of the binary operation. | 03-04-2010 |
20100121898 | Floating-point fused dot-product unit - In an embodiment, a dot-product unit to perform single-precision floating-point product and addition operations is disclosed that includes a first multiplier tree unit adapted to multiply first and second significand operands to produce a first set of two partial products. The dot-product unit further includes a second multiplier tree unit adapted to multiply third and fourth significand operands to produce a second set of two partial products, a shared exponent compare unit adapted to compare exponents of the first, second, third and fourth operands to produce an alignment shift value, and an alignment unit adapted to shift the second set of two partial products based on the alignment shift value. The dot-product unit also includes an adder unit adapted to add or subtract the first set of two partial products and the second shifted set of two partial products to produce a dot-product value that is a single-precision floating-point value. | 05-13-2010 |
20110137970 | MODE-BASED MULTIPLY-ADD RECODING FOR DENORMAL OPERANDS - In a denormal support mode, the normalization circuit of a floating-point adder is used to normalize or denormalized the output of a floating-point multiplier. Each floating-point multiply instruction is speculatively converted to a multiply-add instruction, with the addend forced to zero. This preserves the value of the product, while normalizing or denormalizing the product using the floating-point adder's normalization circuit. When the operands to the multiply operation are available, they are inspected. If the operands will not generate an unnormal intermediate product or a denormal final product, the add operation is suppressed, such as by operand-forwarding. Additionally, each non-fused floating-point multiply-add instruction is replaced with a multiply-add instruction having a zero addend, and a floating-point add instruction having the addend of the original multiply-add instruction is inserted into the instruction stream. Upon inspection of the operands, if an unnormal intermediate result or a denormal final result will not occur, the addend may be restored to the multiply-add instruction and the add instruction converted to a NOP. | 06-09-2011 |
20120078992 | FUNCTIONAL UNIT FOR VECTOR INTEGER MULTIPLY ADD INSTRUCTION - A vector functional unit implemented on a semiconductor chip to perform vector operations of dimension N is described. The vector functional unit includes N functional units. Each of the N functional units have logic circuitry to perform: a first integer multiply add instruction that presents highest ordered bits but not lowest ordered bits of a first integer multiply add calculation, and, a second integer multiply add instruction that presents lowest ordered bits but not highest ordered bits of a second integer multiply add calculation. | 03-29-2012 |
20130091190 | PROCESSOR FOR PERFORMING MULTIPLY-ADD OPERATIONS ON PACKED DATA - A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data. | 04-11-2013 |
20130124592 | OPERAND-OPTIMIZED ASYNCHRONOUS FLOATING-POINT UNITS AND METHOD OF USE THEREOF - Asynchronous arithmetic units including an asynchronous IEEE 754 compliant floating-point adder and an asynchronous floating point multiplier component. Arithmetic units optimized for lower power consumption and methods for optimization are disclosed. | 05-16-2013 |
20130262547 | PROCESSOR FOR PERFORMING MULTIPLY-ADD OPERATIONS ON PACKED DATA - A method and apparatus for including in a processor instructions for performing multiply-add operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-add operations on data elements in the first and second packed data. | 10-03-2013 |
20130282783 | Systems and Methods for a Floating-Point Multiplication and Accumulation Unit Using a Partial-Product Multiplier in Digital Signal Processors - An embodiment of an apparatus performs a floating-point multiply-add process on a first multiplicand, a second multiplicand, and an addend. A leading 0 bit is added to a mantissa of the first multiplicand to form an expanded first mantissa, and a partial-product multiplication is performed on the expanded first mantissa and a mantissa of the second multiplicand to produce partial-product sum and a partial-product carry mantissas. Leading bits of the partial-product sum and carry mantissas are changed to 0 bits if they are both 1 bits, and the partial-product sum and the partial-product carry are shifted right according to an exponent difference of a product of the first multiplicand and the second multiplicand. Otherwise both the partial-product sum and carry mantissas are arithmetically shifted right according to the exponent difference. The first and second multiplicands and the addend can be complex numbers. | 10-24-2013 |
20130282784 | ARITHMETIC PROCESSING DEVICE AND METHODS THEREOF - A device and methods are disclosed for communicating an unrounded result from one arithmetic calculation for use in a second, subsequent calculation. For example, an unrounded result of a first calculation can be forwarded to provide a multiplier, a multiplicand or an addend operand for the subsequent operation. The operand can be forwarded to the input of the same fused multiply addition module (FMAM) that supplied the result, or to another FMAM, and do so without regard to the precision of the forwarded operand, the precision of the subsequent operation, or the native precision of the FMAM. | 10-24-2013 |
20130332501 | Fused Multiply-Adder with Booth-Encoding - A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result. | 12-12-2013 |
20140006467 | DOUBLE ROUNDED COMBINED FLOATING-POINT MULTIPLY AND ADD | 01-02-2014 |
20140067894 | OPERATIONS FOR EFFICIENT FLOATING POINT COMPUTATIONS - Systems and methods for efficiently handling problematic corner cases in floating point operations without raising flags or exceptions. One or more floating point numbers that will generate a problematic corner case in floating point computations, such as division or square root computation, are detected. Fix-up operations are applied to modify the computation such that the problematic corner case is avoided. The modified computation then is performed, while suppressing error flags are suppressed during intermediate stages. | 03-06-2014 |
20140067895 | MICROARCHITECTURE FOR FLOATING POINT FUSED MULTIPLY-ADD WITH EXPONENT SCALING - Systems and methods for implementing a floating point fused multiply and accumulate with scaling (FMASc) operation. A floating point unit receives input multiplier, multiplicand, addend, and scaling factor operands. A multiplier block is configured to multiply mantissas of the multiplier and multiplicand to generate an intermediate product. Alignment logic is configured to pre-align the addend with the intermediate product based on the scaling factor and exponents of the addend, multiplier, and multiplicand, and accumulation logic is configured to add or subtract a mantissa of the pre-aligned addend with the intermediate product to obtain a result of the floating point unit. Normalization and rounding are performed on the result, avoiding rounding during intermediate stages. | 03-06-2014 |
20140095568 | Fused Multiply-Adder with Booth-Encoding - A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result. | 04-03-2014 |
20140136587 | FLOATING POINT MULTIPLY-ADD UNIT WITH DENORMAL NUMBER SUPPORT - The present application provides a method and apparatus for supporting denormal numbers in a floating point multiply-add unit (FMAC). One embodiment of the FMAC is configurable to add a product of first and second operands to a third operand. This embodiment of the FMAC is configurable to determine a minimum exponent shift for a sum of the product and the third operand by subtracting a minimum normal exponent from a product exponent of the product. This embodiment of the FMAC is also configurable to cause bits representing the sum to be left shifted by the minimum exponent shift if a third exponent of the third operand is less than or equal to the product exponent and the minimum exponent shift is less than or equal to a predicted left shift for the sum. | 05-15-2014 |
20140164464 | VECTOR EXECUTION UNIT WITH PRENORMALIZATION OF DENORMAL VALUES - A method, circuit arrangement, and program product for executing instructions including denormal values for one or more operands in a vector execution unit. A denormal value operand may be prenormalized by a first processing lane of the vector execution unit upon detecting the denormal value. The prenormalized value and any other operands of the instruction may be communicated to a dot product adder of the vector execution unit. The dot product adder performs at least a portion of the floating point operation with the prenormalized value and any other operands of the instruction. | 06-12-2014 |
20140164465 | VECTOR EXECUTION UNIT WITH PRENORMALIZATION OF DENORMAL VALUES - A method, circuit arrangement, and program product for executing instructions including denormal values for one or more operands in a vector execution unit. A denormal value operand may be prenormalized by a first processing lane of the vector execution unit upon detecting the denormal value. The prenormalized value and any other operands of the instruction may be communicated to a dot product adder of the vector execution unit. The dot product adder performs at least a portion of the floating point operation with the prenormalized value and any other operands of the instruction. | 06-12-2014 |
20140188966 | Floating-point multiply-add unit using cascade design - A floating-point fused multiply-add (FMA) unit embodied in an integrated circuit includes a multiplier circuit cascaded with an adder circuit to produce a result A*C+B. To decrease latency, the FMA includes accumulation bypass circuits forwarding an unrounded result of the adder to inputs of the close path and the far path circuits of the adder, and forwarding an exponent result in carry save format to an input of the exponent difference circuit. Also included in the FMA is a multiply-add bypass circuit forwarding the unrounded result to the inputs of the multiplier circuit. The adder circuit includes an exponent difference circuit implemented in parallel with the multiplier circuit; a close path circuit implemented after the exponent difference circuit; and a far path circuit implemented after the exponent difference circuit. | 07-03-2014 |
20140188967 | Leading Change Anticipator Logic - In one embodiment, a processor includes at least one floating point unit. The at least one floating point unit may include an adder, leading change anticipator (LCA) logic, and a shifter. The adder may be to add a first operand X and a second operand Y to obtain an output operand having a bit length n. The LCA logic may be to: for each bit position i from n−1 to 1, obtain a set of propagation values and a set of bit values based on the first operand X and the second operand Y; and generate a LCA mask based on the set of propagation values and the set of bit values. The shifter may be to normalize the output operand based on the LCA mask. Other embodiments are described and claimed. | 07-03-2014 |
20140188968 | VARIABLE PRECISION FLOATING POINT MULTIPLY-ADD CIRCUIT - Embodiments of the present invention may provide methods and circuits for energy efficient floating point multiply and/or add operations. A variable precision floating point circuit may determine the certainty of the result of a multiply-add floating point calculation in parallel with the floating-point calculation. The variable precision floating point circuit may use the certainty of the inputs in combination with information from the computation, such as, binary digits that cancel, normalization shifts, and rounding, to perform a calculation of the certainty of the result. A floating point multiplication circuit may determine whether a lowest portion of a multiplication result could affect the final result and may induce a replay of the multiplication operation when it is determined that the result could affect the final result. | 07-03-2014 |
20140195580 | FLOATING POINT ROUND-OFF AMOUNT DETERMINATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A method of an aspect includes receiving a floating point round-off amount determination instruction. The instruction indicates a source of one or more floating point data elements, indicates a number of fraction bits after a radix point, and indicates a destination storage location. A result including one or more result floating point data elements is stored in the destination storage location in response to the floating point round-off amount determination instruction. Each of the one or more result floating point data elements includes a difference between a corresponding floating point data element of the source in a corresponding position, and a rounded version of the corresponding floating point data element of the source that has been rounded to the indicated number of the fraction bits. Other methods, apparatus, systems, and instructions are disclosed. | 07-10-2014 |
20140244704 | FUSED MULTIPLY ADD PIPELINE - A method for operating a fused-multiply-add pipeline in a floating-point unit of a processor is disclosed. A multiplication is initially performed between a first operand and a second operand in a multiplier block to obtain a set of partial product results. The partial product results are sent to a carry-save adder block. A partial product reduction is performed on the partial product results to generate a carry-save result having a sum term and a carry term. The carry-save result is then formatted to generate a carry-out bit. The carry-save result is added to a third operand to generate a final result. | 08-28-2014 |
20140351308 | SYSTEM AND METHOD FOR DYNAMICALLY REDUCING POWER CONSUMPTION OF FLOATING-POINT LOGIC - A system and method are provided for dynamically reducing power consumption of floating-point logic. A disable control signal that is based on a characteristic of a floating-point format input operand is received and a portion of a logic circuit is disabled based on the disable control signal. The logic circuit processes the floating-point format input operand to generate an output. | 11-27-2014 |
20140351309 | FMA UNIT, IN PARTICULAR FOR UTILIZATION IN A MODEL COMPUTATION UNIT FOR PURELY HARDWARE-BASED COMPUTING OF FUNCTION MODELS - An FMA unit, for carrying out an arithmetic operation in a model computation unit of a control unit, is configured to process input of two factors and one summand in the form of floating point values, and provide a computation result of such processing as an output variable in the form of a floating point value. The FMA unit is designed to carry out a multiplication and a subsequent addition, the bit resolutions of the inputs for the factors being lower than the bit resolution of the input for the summand and the bit resolution of the output variable. | 11-27-2014 |
20140379773 | FUSED MULTIPLY ADD OPERATIONS USING BIT MASKS - Systems and methods of performing a fused multiply add (FMA) operations are provided. In one embodiment, the length of the adder used by the FMA operation is less than 3*N, where N is the number of bits in the mantissa term of a floating point number. A mask may be used to perform the addition portion of the FMA operation using the adder. A second mask may be used to denormalize the result of the addition portion of the FMA operation if an underflow occurs. | 12-25-2014 |
20150347089 | MICROARCHITECTURE FOR FLOATING POINT FUSED MULTIPLY-ADD WITH EXPONENT SCALING - Systems and methods for implementing a floating point fused multiply and accumulate with scaling (FMASc) operation. A floating point unit receives input multiplier, multiplicand, addend, and scaling factor operands. A multiplier block is configured to multiply mantissas of the multiplier and multiplicand to generate an intermediate product. Alignment logic is configured to pre-align the addend with the intermediate product based on the scaling factor and exponents of the addend, multiplier, and multiplicand, and accumulation logic is configured to add or subtract a mantissa of the pre-aligned addend with the intermediate product to obtain a result of the floating point unit. Normalization and rounding are performed on the result, avoiding rounding during intermediate stages. | 12-03-2015 |
20150370537 | High Efficiency Computer Floating Point Multiplier Unit - A high-power-efficiency multiplier combines a standard floating-point multiplier with a power-of-two multiplier that performs multiplications by shifting operations without the need for floating-point multiplication circuitry. By selectively steering some operands to this power-of-two multiplier, substantial power savings may be realized. In one embodiment, multiplicands may be modified to work with the power-of-two multiplier introducing low errors that may be accommodated in pixel calculations. | 12-24-2015 |
20160004506 | STANDARD FORMAT INTERMEDIATE RESULT - A microprocessor comprises an instruction pipeline, a shared memory, and first and second arithmetic processing units in the instruction pipeline, each capable of reading or receiving operands from and writing or providing results to the shared memory. The first arithmetic processing unit performs a first portion of a mathematical operation to produce an intermediate result vector that is not a complete, final result of the mathematical operation. The first arithmetic processing unit generates a plurality of non-architectural calculation control indicators that indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The second arithmetic processing unit performs a second portion of the mathematical operation, in accordance with the calculation control indicators, to produce a complete, final result of the mathematical operation. | 01-07-2016 |
20160004508 | SUBDIVISION OF A FUSED COMPOUND ARITHMETIC OPERATION - A microprocessor prepares a fused multiply-accumulate operation of a form ±A*B±C for execution by issuing first and second multiply-accumulate microinstructions to one or more instruction execution units to complete the fused multiply-accumulate operation. The first multiply-accumulate microinstruction causes an unrounded nonredundant result vector to be generated from a first accumulation of a selected one of (a) the partial products of A and B or (b) C with the partial products of A and B. The second multiply-accumulate microinstruction causes performance of a second accumulation of C with the unrounded nonredundant result vector, if the first accumulation did not include C. The second multiply-accumulate microinstruction also causes a final rounded result to be generated from the unrounded nonredundant result vector, wherein the final rounded result is a complete result of the fused multiply-accumulate operation. | 01-07-2016 |
20160004509 | CALCULATION CONTROL INDICATOR CACHE - An arithmetic operation is performed using a first instruction execution unit to generate an intermediate result vector and a plurality of calculation control indicators that indicate how subsequent calculations to generate a final result from the intermediate result vector should proceed. The intermediate result vector and the plurality of calculation control indicators are stored in memory external to the instruction execution unit, and later read by a second instruction execution unit to complete the arithmetic operation. | 01-07-2016 |
20160077802 | DOUBLE ROUNDED COMBINED FLOATING-POINT MULTIPLY AND ADD - Methods, apparatus, instructions and logic are disclosed providing double rounded combined floating-point multiply and add functionality as scalar or vector SIMD instructions or as fused micro-operations. Embodiments include detecting floating-point (FP) multiplication operations and subsequent FP operations specifying as source operands results of the FP multiplications. The FP multiplications and the subsequent FP operations are encoded as combined FP operations including rounding of the results of FP multiplication followed by the subsequent FP operations. The encoding of said combined FP operations may be stored and executed as part of an executable thread portion using fused-multiply-add hardware that includes overflow detection for the product of FP multipliers, first and second FP adders to add third operand addend mantissas and the products of the FP multipliers with different rounding inputs based on overflow, or no overflow, in the products of the FP multiplier. Final results are selected respectively using overflow detection. | 03-17-2016 |
20160085508 | OPTIMIZED STRUCTURE FOR HEXADECIMAL AND BINARY MULTIPLIER ARRAY - A method for hiding implicit bit corrections in a partial product adder array in a binary and hexadecimal floating-point multiplier such that no additional adder stages are needed for the implicit bit corrections. Two leading-one correction terms are generated for the fraction in the multiplier floating-point number and two leading-one correction terms are generated for the fraction in the multiplicand floating-point number. The floating-point numbers may be single-precision or double-precision. Each leading-one correction term for the single-precision case is appended to the left of an intermediate partial product sum in the adder array that is an input to an adder so as to not to extend the bits in the input further to the left than the bits in another input to the adder. Each leading-one correction term for the double-precision case replaces an adder input that is unused when base-2 floating-point numbers are multiplied. | 03-24-2016 |
20160085509 | OPTIMIZED STRUCTURE FOR HEXADECIMAL AND BINARY MULTIPLIER ARRAY - A method for hiding implicit bit corrections in a partial product adder array in a binary and hexadecimal floating-point multiplier such that no additional adder stages are needed for the implicit bit corrections. Two leading-one correction terms are generated for the fraction in the multiplier floating-point number and two leading-one correction terms are generated for the fraction in the multiplicand floating-point number. The floating-point numbers may be single-precision or double-precision. Each leading-one correction term for the single-precision case is appended to the left of an intermediate partial product sum in the adder array that is an input to an adder so as to not to extend the bits in the input further to the left than the bits in another input to the adder. Each leading-one correction term for the double-precision case replaces an adder input that is unused when base-2 floating-point numbers are multiplied. | 03-24-2016 |
20160188295 | UNIFIED MULTIPLY UNIT - Embodiments disclosed pertain to apparatuses, systems, and methods for performing multi-precision single instruction multiple data (SIMD) operations on integer, fixed point and floating point operands. Disclosed embodiments pertain to a circuit that is capable of performing concurrent multiply, fused multiply-add, rounding, saturation, and dot products on the above operand types. In addition, the circuit may facilitate 64-bit multiplication when Newton-Raphson, divide and square root operations are performed. | 06-30-2016 |