Patent application number | Description | Published |
20090198758 | METHOD FOR SIGN-EXTENSION IN A MULTI-PRECISION MULTIPLIER - A method for implementing sign extension within a multi-precision multiplier is described. The method includes receiving a first input within a multiplier core of the multiplier, receiving a second input within the multiplier core, and creating partial products in the multiplier core using the first and second inputs. The method also includes summing up the partial products in a partial product reduction tree in the multiplier core. The method also includes performing sign extension within the partial product reduction tree of the multiplier core by adding a value to a partial product of the partial product reduction tree. The method further includes computing an output from the partial product reduction tree, the output including a final product of the first and second inputs signed extended to a desired width. | 08-06-2009 |
20090198974 | METHODS FOR CONFLICT-FREE, COOPERATIVE EXECUTION OF COMPUTATIONAL PRIMITIVES ON MULTIPLE EXECUTION UNITS - A method for executing multiple computational primitives is provided in accordance with exemplary embodiments. A first computational unit and at least a second computational unit cooperate to execute multiple computational primitives. The first computational unit independently computes other computational primitives. By virtue of arbitration for shared source operand buses or shared result buses, availability of the first and second computational units needed to execute cooperatively the multiple computational primitives is assured by a process of reservation as used for a computational primitive executed on a dedicated computational unit. | 08-06-2009 |
20100017635 | ZERO INDICATION FORWARDING FOR FLOATING POINT UNIT POWER REDUCTION - A method, system and computer program product for reducing power consumption when processing mathematical operations. Power may be reduced in processor hardware devices that receive one or more operands from an execution unit that executes instructions. A circuit detects when at least one operand of multiple operands is a zero operand, prior to the operand being forwarded to an execution component for completing a mathematical operation. When at least one operand is a zero operand or at least one operand is “unordered”, a flag is set that triggers a gating of a clock signal. The gating of the clock signal disables one or more processing stages and/or devices, which perform the mathematical operation. Disabling the stages and/or devices enables computing the correct result of the mathematical operation on a reduced data path. When a device(s) is disabled, the device may be powered off until the device is again required by subsequent operations. | 01-21-2010 |
20100023573 | EFFICIENT FORCING OF CORNER CASES IN A FLOATING POINT ROUNDER - The forcing of the result or output of a rounder portion of a floating point processor occurs only in a fraction non-increment data path within the rounder and not in the fraction increment data path within the rounder. The fraction forcing is active on a corner case such as a disabled overflow exception. A disabled overflow exception may be detected by inspecting the normalized exponent. If a disabled overflow exception is detected, the round mode is selected to execute only in the non-increment data path thereby preventing the fraction increment data path from being selected. | 01-28-2010 |
20100058266 | 3-Stack Floorplan for Floating Point Unit - A 3-stack floorplan for a floating point unit includes: an aligner located in the center of the floating point unit; a frontend located directly above the aligner; a multiplier located directly below the frontend and next to the aligner; an adder located directly next to the multiplier and directly below the aligner; a normalizer located directly above the adder; and a rounder located directly above the normalizer. | 03-04-2010 |
20100063987 | SUPPORTING MULTIPLE FORMATS IN A FLOATING POINT PROCESSOR - In a binary floating point processor, the exponents of each of the various types of operands are recoded into an internal format, by biasing the exponents with the minimum exponent value of the result precision (“Emin”), i.e., the recoded value of the exponent is the represented value of the exponent minus Emin. Emin depends only on the result precision of the instruction that is currently being executed in the binary floating point processor. The exponent computations are then performed in this new format. The underflow check for all result precisions is a check against zero and overflow checks are performed against a positive number that depends on the result precision. The exponent values are in a 2's complement representation, so the underflow check simply becomes a check of the sign bit. | 03-11-2010 |
20100095099 | SYSTEM AND METHOD FOR STORING NUMBERS IN FIRST AND SECOND FORMATS IN A REGISTER FILE - A system and a method for storing numbers in a register file are provided. The system and the method store single precision numbers in double precision format in a register file that is shared between floating point computational units and computational units not supporting floating point numbers. | 04-15-2010 |
20110320514 | DECIMAL ADDER WITH END AROUND CARRY - Binary code decimal (BCD) arithmetic add/subtract operations on two BCD numbers independent of which BCD number is of a greater magnitude include, responsive to the BCD arithmetic add/subtract operation being a subtract operation, performing a BCD arithmetic subtraction operation on a first BCD number and a second BCD number, the first BCD number having a first magnitude and the second BCD number having a second magnitude. The first magnitude is greater than, equal to, or less than the second magnitude. The performing includes: in parallel to a carry generation, partial sums or partial differences of the first and second BCD numbers are computer such that a final result in signed magnitude form is selectable from the partial sums or differences based on carry information without any post processing steps. | 12-29-2011 |
20120284548 | Zero Indication Forwarding for Floating Point Unit Power Reduction - A method and system for reducing power consumption when processing mathematical operations. Power may be reduced in processor hardware devices that receive one or more operands from an execution unit that executes instructions. A circuit detects when at least one operand of multiple operands is a zero operand, prior to the operand being forwarded to an execution component for completing a mathematical operation. When at least one operand is a zero operand or at least one operand is “unordered”, a flag is set that triggers a gating of a clock signal. The gating of the clock signal disables one or more processing stages and/or devices, which perform the mathematical operation. Disabling the stages and/or devices enables computing the correct result of the mathematical operation on a reduced data path. When a device(s) is disabled, the device may be powered off until the device is again required by subsequent operations. | 11-08-2012 |
20130124588 | ENCODING DENSELY PACKED DECIMALS - According to one aspect of the present disclosure, a method and technique for encoding densely packed decimals is disclosed. The method includes: executing a floating point instruction configured to perform a floating point operation on decimal data in a binary coded decimal (BCD) format; determining whether a result of the operation includes a rounded mantissa overflow; and responsive to determining that the result of the operation includes a rounded mantissa overflow, compressing a result of the operation from the BCD-formatted decimal data to decimal data in a densely packed decimal (DPD) format by shifting select bit values of the BCD formatted decimal data by one digit to select bit positions in the DPD format. | 05-16-2013 |
20130159666 | REDUCING ISSUE-TO-ISSUE LATENCY BY REVERSING PROCESSING ORDER IN HALF-PUMPED SIMD EXECUTION UNITS - Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction. | 06-20-2013 |
20130332501 | Fused Multiply-Adder with Booth-Encoding - A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result. | 12-12-2013 |
20130339417 | RESIDUE-BASED EXPONENT FLOW CHECKING - A technique for checking an exponent calculation for an execution unit that supports floating point operations includes generating, using a residue prediction circuit, a predicted exponent residue for a result exponent of a floating point operation. The technique also includes generating, using an exponent calculation circuit, the result exponent for the floating point operation and generating, using the residue prediction circuit, a result exponent residue for the result exponent. Finally, the technique includes comparing the predicted exponent residue to the result exponent residue to determine whether the result exponent generated by the exponent calculation circuit is correct and, if not, signaling an error. | 12-19-2013 |
20130339802 | DYNAMIC HARDWARE TRACE SUPPORTING MULTIPHASE OPERATIONS - A method and system for tracing in a data processing system. The method includes receiving a plurality of signals associated with an operation during execution of the operation. The method also includes, in response to an indication that the operation is a multiphase operation, during execution of the operation, selection logic, during a first phase of the multiphase operation, selecting and outputting as a trace signal a first signal of the plurality of signals, and during a second phase of the multiphase operation, selecting and outputting as the trace signal a second signal of the plurality of signals. | 12-19-2013 |
20140075153 | REDUCING ISSUE-TO-ISSUE LATENCY BY REVERSING PROCESSING ORDER IN HALF-PUMPED SIMD EXECUTION UNITS - Techniques for reducing issue-to-issue latency by reversing processing order in half-pumped single instruction multiple data (SIMD) execution units are described. In one embodiment a processor functional unit is provided comprising a frontend unit, and execution core unit, a backend unit, an execution order control signal unit, a first interconnect coupled between and output and an input of the execution core unit and a second interconnect coupled between an output of the backend unit and an input of the frontend unit. In operation, the execution order control signal unit generates a forwarding order control signal based on the parity of an applied clock signal on reception of a first vector instruction. This control signal is in turn used to selectively forward first and second portions of an execution result of the first vector instruction via the interconnects for use in the execution of a dependent second vector instruction. | 03-13-2014 |
20140095568 | Fused Multiply-Adder with Booth-Encoding - A fused multiply-adder is disclosed. The fused multiply-adder includes a Booth encoder, a fraction multiplier, a carry corrector, and an adder. The Booth encoder initially encodes a first operand. The fraction multiplier multiplies the Booth-encoded first operand by a second operand to produce partial products, and then reduces the partial products into a set of redundant sum and carry vectors. The carry corrector then generates a carry correction factor for correcting the carry vectors. The adder adds the redundant sum and carry vectors and the carry correction factor to a third operand to yield a final result. | 04-03-2014 |
20140095941 | DYNAMIC HARDWARE TRACE SUPPORTING MULTIPHASE OPERATIONS - A method and system for tracing in a data processing system. The method includes receiving a plurality of signals associated with an operation during execution of the operation. The method also includes, in response to an indication that the operation is a multiphase operation, during execution of the operation, selection logic, during a first phase of the multiphase operation, selecting and outputting as a trace signal a first signal of the plurality of signals, and during a second phase of the multiphase operation, selecting and outputting as the trace signal a second signal of the plurality of signals. | 04-03-2014 |
20140149481 | Decimal Multi-Precision Overflow and Tininess Detection - An approach is provided in which a processor includes an adder that concurrently generates one or more intermediate results and a boundary indicator based upon instructions retrieved from a memory area. The boundary indicator indicates whether a collective result generated from the intermediate results is within a boundary precision value. | 05-29-2014 |
20140164463 | EXPONENT FLOW CHECKING - A technique for checking an exponent calculation for an execution unit that supports floating point operations includes generating, using a residue prediction circuit, a predicted exponent residue for a result exponent of a floating point operation. The technique also includes generating, using an exponent calculation circuit, the result exponent for the floating point operation and generating, using the residue prediction circuit, a result exponent residue for the result exponent. Finally, the technique includes comparing the predicted exponent residue to the result exponent residue to determine whether the result exponent generated by the exponent calculation circuit is correct and, if not, signaling an error. | 06-12-2014 |
20150067298 | SPLITABLE AND SCALABLE NORMALIZER FOR VECTOR DATA - A hardware circuit component configured to support vector operations in a scalar data path. The hardware circuit component configured to operate in a vector mode configuration and in a scalar mode configuration. The hardware circuit component configured to split the scalar mode configuration into a left half and a right half of the vector mode configuration. The hardware circuit component configured to perform one or more bit shifts over one or more stages of interconnected multiplexers in the vector mode configuration. The hardware circuit component configured to include duplicated coarse shift multiplexers at bit positions that receive data from both the left half and the right half of the vector mode configuration, resulting in one or more coarse shift multiplexers sharing the bit position. | 03-05-2015 |
20150067299 | SPLITABLE AND SCALABLE NORMALIZER FOR VECTOR DATA - A hardware circuit component configured to support vector operations in a scalar data path. The hardware circuit component configured to operate in a vector mode configuration and in a scalar mode configuration. The hardware circuit component configured to split the scalar mode configuration into a left half and a right half of the vector mode configuration. The hardware circuit component configured to perform one or more bit shifts over one or more stages of interconnected multiplexers in the vector mode configuration. The hardware circuit component configured to include duplicated coarse shift multiplexers at bit positions that receive data from both the left half and the right half of the vector mode configuration, resulting in one or more coarse shift multiplexers sharing the bit position. | 03-05-2015 |
Patent application number | Description | Published |
20080263336 | Processor Having Efficient Function Estimate Instructions - High-precision floating-point function estimates are split in two instructions each: a low precision table lookup instruction and a linear interpolation instruction. Estimates of different functions can be implemented using this scheme: A separate table-lookup instruction is provided for each different function, while only a single interpolation instruction is needed, since the single interpolation instruction can perform the interpolation step for any of the functions to be estimated. Thus, significantly less overhead is incurred than would be incurred with specialized hardware, while still maintaining a uniform FPU latency, which allows for much simpler control logic. | 10-23-2008 |
20090024684 | Method for Controlling Rounding Modes in Single Instruction Multiple Data (SIMD) Floating-Point Units - A method for controlling rounding modes in a single instruction multiple data (SIMD) floating-point unit is disclosed. The SIMD floating-point unit includes a floating-point status-and-control register (FPSCR) having a first rounding mode bit field and a second rounding mode bit field. The SIMD floating-point unit also includes means for generating a first slice and a second slice. During a floating-point operation, the SIMD floating-point unit concurrently performs a first rounding operation on the first slice and a second rounding operation on the second slice according to a bit in the first rounding mode bit field and a bit in the second rounding mode bit field within the FPSCR, respectively. | 01-22-2009 |
20100100578 | DISTRIBUTED RESIDUE-CHECKING OF A FLOATING POINT UNIT - A distributed residue checking apparatus for a floating point unit having a plurality of functional elements performing floating-point operations on a plurality of operands. The distributed residue checking apparatus includes a plurality of residue generators which generate residue values for the operands and the functional elements, and a plurality of residue checking units distributed throughout the floating point unit. Each residue checking unit receives a first residue value and a second residue value from respective residue generators and compares the first residue value to the second residue value to determine whether an error has occurred in a floating-point operation performed by a respective functional element. | 04-22-2010 |
20100146023 | SHIFTER WITH ALL-ONE AND ALL-ZERO DETECTION - A shifter that includes a plurality of shift stages positioned within the shifter, and receiving and shifting input data to generate a shifted result, and a detection circuit coupled at an input of a final shift stage of the plurality of shifters, in a final stage within the shifter. The detection circuit receives a partially shifted vector at the input of the final shift stage along with a predetermined shift amount, and performing an all-one or all-zero detection operation using a portion of the partially shifted vector and the predetermined shift amount, in parallel, to a shifting operation performed by the final shift stage to generate the shifted result. | 06-10-2010 |
20100174764 | REUSE OF ROUNDER FOR FIXED CONVERSION OF LOG INSTRUCTIONS - A method for converting a signed fixed point number into a floating point number that includes reading an input number corresponding to a signed fixed point number to be converted, determining whether the input number is less than zero, setting a sign bit based upon whether the input number is less than zero or greater than or equal to zero, computing a first intermediate result by exclusive-ORing the input number with the sign bit, computing leading zeros of the first intermediate result, padding the first intermediate result based upon the sign bit, computing a second intermediate result by shifting the padded first intermediate result to the left by the leading zeros, computing an exponent portion and a fraction portion, conditionally incrementing the fraction portion based on the sign bit, correcting the exponent portion and the fraction portion if the incremented fraction portion overflows, and returning the floating point number. | 07-08-2010 |
20110320512 | Decimal Floating Point Mechanism and Process of Multiplication without Resultant Leading Zero Detection - A decimal multiplication mechanism for fixed and floating point computation in a computer having a coefficient mechanism without resulting leading zero detection (LZD) and process which assumes that the final product will be M+N digits in length and performs all calculations based on this assumption. Least significant digits that would be truncated are no longer stored, but retained as sticky information which is used to finalize the result product. Once the computation of the product is complete, a final check based on the examination of key bits observed during partial product accumulation is used to determine if the final product is truly M+N digits in length, or M+N−1 digits. If the latter is true, then corrective final product shifting is employed to obtain the proper result. This eliminates the need for dedicated leading zero detection hardware used to determine the number of significant digits in the final product. The corrective final product shifting also incorporates adjustments to the coefficient of the product when the product's exponent is at its extremes and the final product must be brought to be within the precision and range of a given format. | 12-29-2011 |
20130173681 | Range Check Based Lookup Tables - Mechanisms for utilizing a reduced lookup table circuit to perform an operation in a data processing device are provided. A first input value is input for selecting a subset of values from the reduced lookup table circuit. The reduced lookup table circuit stores only boundary cell values from a fully filled lookup table corresponding to the reduced lookup table circuit. The subset of values comprises only a subset of boundary cell values corresponding to the first input value. A second value is input and a comparison, by the reduced lookup table circuit, of the second value to each of the boundary cell values in the subset of boundary cell values is performed. The reduced lookup table circuit outputs an output value based on results of the comparison of the second value to each of the boundary cell values in the subset of boundary cell values. | 07-04-2013 |
20130173683 | Range Check Based Lookup Tables - Mechanisms for utilizing a reduced lookup table circuit to perform an operation in a data processing device are provided. A first input value is input for selecting a subset of values from the reduced lookup table circuit. The reduced lookup table circuit stores only boundary cell values from a fully filled lookup table corresponding to the reduced lookup table circuit. The subset of values comprises only a subset of boundary cell values corresponding to the first input value. A second value is input and a comparison, by the reduced lookup table circuit, of the second value to each of the boundary cell values in the subset of boundary cell values is performed. The reduced lookup table circuit outputs an output value based on results of the comparison of the second value to each of the boundary cell values in the subset of boundary cell values. | 07-04-2013 |