Entries |
Document | Title | Date |
20080209182 | MULTI-MODE DATA PROCESSING DEVICE AND METHODS THEREOF - A data processing device and methods thereof are disclosed. The data processing device can operate in three different modes. In a first, N-bit mode, the data processing device performs memory accesses based on N-bit values and performs arithmetic operations using N-bit values. In a second, hybrid N-bit/M-bit mode, the data processing device performs memory accesses based on M-bit values, where M is less than N, and performs arithmetic operations using N-bit values. In a third, M-bit mode, the data processing device performs memory accesses based on M-bit values and performs arithmetic operations using M-bit values. The three modes provide for compatibility with a wide range of applications. Further operation in the M-bit mode can provide a power savings when implementing applications compatible with that mode. | 08-28-2008 |
20080215859 | Computer with high-speed context switching - A computer which performs parallel processing of a plurality of programs in a time-division fashion includes hardware resources divided into a plurality of areas, an evacuation unit which records identification information identifying a first program, and evacuates information stored in an area of said plurality of areas if the area is necessary for execution of a second program and is being used for execution of the first program, and a restoration unit which restores the evacuated information to the area based on the identification information when the second program comes to a halt or to an end. | 09-04-2008 |
20080229080 | ARITHMETIC PROCESSING UNIT - An arithmetic processing unit includes a register file provided with multiple register windows, an arithmetic executor executes an instruction with data retained in the register file as an operand, and a current window pointer which retains address information specifying a register window which becomes a current window, and a controller. The controller controls the address information retained by the current window pointer is updated, when a window switching instruction for indicating switching of the current window has been decoded. The arithmetic executor reads data in a first register window specified by the address information before being updated and data in a second register window specified by the updated address information from the register file, after the decoding of said window switching instruction has been started until commit of the window switching instruction is started. | 09-18-2008 |
20080229081 | RECONFIGURABLE CIRCUIT, RECONFIGURABLE CIRCUIT SYSTEM, AND RECONFIGURABLE CIRCUIT SETTING METHOD - Each cell comprises a first selector which accepts K-pieces (K is a natural number of 2 or more) of data, and then outputs a single piece of data; a second selector which accepts K-pieces (K is a natural number of 2 or more) of data, and then outputs a single piece of data; an arithmetic and logic unit which accepts selection output of the first selector and selection output of the second selector in N bits (N is a natural number of 2 or more), and performs a logic operation that is selected from a plurality of logic operations on accepted data of N bits; a selection controller which supplies, to the first selector and the second selector, a data selection control signal for indicating data to be selected; and an ALU controller which supplies, to the arithmetic and logic unit, an ALU control signal that designates the logic operation to be executed. The first selector, the second selector, and the arithmetic and logic unit are capable of reconfiguration based on the selection control signal and the ALU control signal. The first selector and the second selector rearranges M[i] bits of i-th data in a prescribed order based on the selection control signal, and outputs the rearranged data (i is a natural number that satisfies i≦K, and M[i] is an integer that satisfies Σ | 09-18-2008 |
20080244240 | SEMICONDUCTOR DEVICE - A semiconductor device includes a first arithmetic engine which executes a first arithmetic process in every cycle and outputs first data representing the result of the first arithmetic process and a first valid signal representing a first or second value in every cycle, and a second arithmetic engine which executes a second arithmetic process in every cycle and outputs second data representing the result of the second arithmetic process and a second valid signal representing the first or second value in every cycle. The device also includes an inter-arithmetic-engine buffer which is used to exchange the first data and the second data between the first and second arithmetic engines, enables write of the first or second data if the first or second valid signal indicates the first value, and inhibits write of the first or second data if the first or second valid signal indicates the second value. | 10-02-2008 |
20080263334 | DYNAMICALLY CONFIGURABLE AND RE-CONFIGURABLE DATA PATH - An apparatus includes a configuration memory coupled to one or more structural arithmetic elements, the configuration memory to store values that cause the structural arithmetic elements to perform various functions. The apparatus also includes a system controller to dynamically load the configuration memory with values, and to prompt the structural arithmetic elements to perform functions according to the values stored by the configuration memory. | 10-23-2008 |
20080270767 | INFORMATION PROCESSING APPARATUS AND PROGRAM EXECUTION CONTROL METHOD - According to one embodiment, an information processing apparatus includes a first processor which has a first instruction set, a second processor which has a second instruction set, a storage unit which stores a program including a first program module which is described by using the second instruction set and causes the second processor to execute a first process including the arithmetic process, and a second program module which is described by using the first instruction set and causes the first processor to execute a process which is the same as the first process, and a control unit which switches a mode for executing the program between a first mode in which the first program module is assigned to the second processor and a second mode in which the second program module is assigned to the first processor. | 10-30-2008 |
20080270768 | Method and apparatus for SIMD complex Arithmetic - Methods and apparatus for calculating Single-Instruction-Multiple-Data (SIMD) complex arithmetic. A coprocessor instruction has a format identifying a multiply and subtract instruction to generate real components for complex multiplication of first operand complex data and corresponding second operand complex data, a cross multiply and add instruction to generate imaginary components for complex multiplication of the first operand complex data and the corresponding second operand complex data, an add-subtract instruction to add real components of the first operand to imaginary components of the second operand and to subtract real components of the second operand from imaginary components of the first operand, and a subtract-add instruction to subtract the real components of the second operand from the imaginary components of the first operand and to add the real components of the first operand to the imaginary components of the second operand. | 10-30-2008 |
20080282070 | SIMD ARITHMETIC DEVICE CAPABLE OF HIGH-SPEED COMPUTING - A general-purpose register file including a plurality of general-purpose registers stores parallel arithmetic data. A plurality of pattern registers store a plurality of items of pattern data indicating the rearrangement of data in bytes, in half words, in words, or in a combination of these units. A data select circuit selects one of the items of pattern data stored in the plurality of pattern registers according to specifying data included in an instruction. A rearranging circuit rearranges parallel arithmetic data according to the item of pattern data selected by the data select circuit. | 11-13-2008 |
20080288755 | CLOCK DRIVEN DYNAMIC DATAPATH CHAINING - A system includes a plurality of datapaths, each having structural arithmetic elements to perform various arithmetic operations based, at least in part, on configuration data. The system also includes a configuration memory coupled to the datapaths, the configuration memory to provide the configuration data to the datapaths, which causes the datapaths to collaborate when performing the arithmetic operations. | 11-20-2008 |
20080294878 | PROCESSOR SYSTEM AND EXCEPTION PROCESSING METHOD - When an error is detected in an error detecting unit in a processor system, the error detecting unit outputs an error signal to an interrupt control unit, and the interrupt control unit outputs a value of an error address register and a control signal to a program counter control unit and rewrites a value of a program counter to a value of an error address register. By this means, the branching process by an error interrupt is realized. In this case, when the error is detected, the process of saving the value of the program counter at the time of error occurrence is not performed, and a specific save register and a control circuit for the recovery to the address at the time of the error occurrence after the end of the error processing are not provided. | 11-27-2008 |
20080301414 | Efficient Complex Multiplication and Fast Fourier Transform (FFT) Implementation on the ManArray Architecture - Efficient computation of complex multiplication results and very efficient fast Fourier transforms (FFTs) are provided. A parallel array VLIW digital signal processor is employed along with specialized complex multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs are used allowing the complex multiplication pipeline hardware to be efficiently used. In addition, efficient techniques for supporting combined multiply accumulate operations are described. | 12-04-2008 |
20080307205 | COMPUTATIONALLY EFFICIENT MATHEMATICAL ENGINE - A method and apparatus perform many different types of algorithms that utilizes a calculation unit capable of utilizing the same multipliers for different algorithms. The calculation unit preferably includes a processor that has a plural number of arithmetic logic unit circuits that are configured to process data in parallel to provide processed data outputs and an adder tree configured to add the processed data outputs from the arithmetic logic circuits. A shift register that has more parallel data outputs then the processor's inputs is controlled to selectively output data from the parallel outputs to the data inputs of the processor. A communication device preferably includes the calculation unit to facilitate processing of wireless communication signals. | 12-11-2008 |
20080320285 | DISTRIBUTED DIGITAL SIGNAL PROCESSOR - A distributed digital signal processor (DSP) includes instruction memory, data memory, a multiply-accumulate module, an instruction MMW transceiver, a data MMW transceiver, and a multiply-accumulate transceiver. The multiply-accumulate module performs a function upon first and second data elements in accordance with a command of an instruction. The instruction MMW transceiver transmits a MMW instruction signal that includes at least a portion of the instruction. The data MMW transceiver transmits a MMW data signal in response to receiving the MMW instruction signal, wherein the MMW data signal includes the first and second data elements. The multiply-accumulate MMW transceiver recovers the first and second data elements from the MMW data signal and recovers a command corresponding to the function from the MMW instruction signal. | 12-25-2008 |
20090006821 | APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT FOR PROCESSING INFORMATION BY CONTROLLING ARITHMETIC MODE - An HW arithmetic unit executes a predetermined arithmetic operation. An arithmetic-mode determining unit determines, based on an attribute or a content of data relating to processing that has requested the arithmetic operation, either a synchronous mode that executes the processing after waiting for completion of the arithmetic operation by an arithmetic circuit or an asynchronous mode that executes the processing without waiting for completion of the arithmetic operation by the arithmetic circuit, as an execution mode of the arithmetic operation. An arithmetic-process control unit controls the arithmetic operation by the arithmetic circuit according to the determined execution mode. | 01-01-2009 |
20090006822 | Device and Method for Adding and Subtracting Two Variables and a Constant - A method device and a method. The method includes fetching an instruction, decoding an instruction that includes an instruction type field, a first variable field, a second variable field, a result field and a constant field; selecting an operation out of addition operation, a subtraction operation and another type of operation, in response to the content of the instruction type field; determining, in response to the value of the constant field, whether the result of the selected operation is responsive to the first and second variables or is responsive to the first variable, the second variable and the constant; and executing the selected operation, during a single instruction execution cycle, to provide the result. | 01-01-2009 |
20090019268 | PROCESSOR - The processor includes: a plurality of functional bocks that are respectively synchronized and operates to perform a process according to a control signal; a connection unit that is changeable to a smaller bandwidth than a bandwidth of inputs/outputs of the respective functional blocks and is connected between the respective functional blocks; a first data converter that switches a bandwidth of the connection unit; a second data converter that switches a data transmission rate of input/output data of the respective functional blocks; and a controller that controls the first data converter and the second data converter. | 01-15-2009 |
20090049282 | SYSTEM AND METHOD FOR MANAGING DATA - A method of performing data and pointer compression includes, in a buffer which is formed between a processor and a level one cache and stores plural tags and full-word values associated with the tags, when the buffer is presented with an address, breaking the address into a line number which indexes a set of the full-word values, and a tag which is used as a key to determine whether a value in the set of full-word values includes a value associated with the presented address, if a tag in the presented address matches a tag in the buffer, returning a full-word value in the buffer which is associated with the tag, and storing the returned full-word value in a destination register of an instruction which originated the presented address, and if a tag in the presented address does not match a tag in the buffer, generating a fault and branching control to a pre-defined handler. | 02-19-2009 |
20090063827 | PARALLEL PROCESSOR AND ARITHMETIC METHOD OF THE SAME - A parallel processor includes a fetch unit configured to hold a processor instruction having a composite arithmetic instruction with repeat designation and a sync instruction, a decoder unit configured to decode the processor instruction, a plurality of pipeline arithmetic units configured to execute arithmetic operations parallel on the basis of the composite arithmetic instruction, pipeline connection between the pipeline arithmetic units being controlled in accordance with the sync instruction, and a sync control unit equipped between the fetch unit and the decoder unit, and configured to control an execution start timing of the pipeline connection between the pipeline arithmetic units in accordance with the sync instruction. | 03-05-2009 |
20090077353 | PROGRAMMING LANGUAGE TYPE SYSTEM WITH AUTOMATIC CONVERSIONS - A programming language type system includes, in a memory, a set of numeric type including integer types, fixed-point types and floating-point types, a set of type propagation rules to automatically determine result types of any combination of integer types, fixed-point types and floating-point types, constant annotations to explicitly specify a result type of a literal constant, context-sensitive constants whose type is determined from a context of a constant according to the set of type propagation rules, an assignment operator to explicitly specify a type of a value or computation, and operator annotations to explicitly specify a result type of a computation. | 03-19-2009 |
20090083524 | PROGRAMMABLE DATA PROCESSING CIRCUIT THAT SUPPORTS SIMD INSTRUCTION - A data processing circuit contains an instruction execution circuit ( | 03-26-2009 |
20090100251 | PARALLEL CONTEXT ADAPTIVE BINARY ARITHMETIC CODING - A method for performing parallel processing of at least two bins in an arithmetic coded bin stream includes: utilizing a current range to calculate a range for a first bin in the bin stream; simultaneously utilizing the current range to forward predict a plurality of possible ranges and low values for a second bin in the bin stream when the first bin is an MPS; when the range for the first bin is calculated, utilizing the calculated range to select a resultant range from the plurality of possible ranges and low values for the second bin. | 04-16-2009 |
20090113185 | Processor for executing multiply matrix and convolve extract instructions requiring wide operands - A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers. | 04-30-2009 |
20090119491 | DATA PROCESSING DEVICE - A data processing device comprises a state manager for determining a logic number of configurational information to be used in a next state, the logic number representing information on a mutual relationship between items of configurational information included in an object code, based on a present operational state, a group of candidates for a state to transit to next, and an event signal issued from arithmetic units, a configuration number converter for outputting a real number corresponding to the logic number determined by the state manager, the configuration number converter having conversion information for converting the logic number into a real number representing a location where the corresponding configurational information is actually stored, and a configurational information storage for storing the configurational information and indicating configurational information corresponding to the real number output from the configuration number converter, to the arithmetic units and an interconnector. | 05-07-2009 |
20090144527 | STREAM PROCESSING APPARATUS, METHOD FOR STREAM PROCESSING AND DATA PROCESSING SYSTEM - The present invention provides a stream processing apparatus capable of improving the processing performance in the case of continuously processing a plurality of data streams. A control stream, different from a data stream, is prepared, and a program and a parameter are updated in advance in accordance with the control stream. Double buffer areas are prepared in a memory of the stream processing apparatus into which the program and the parameter are stored. The location of the data stream to be input is written in the control stream, and buffers for reading the data stream are multiplexed so as to read in advance the top portion of the data stream to be processed next. | 06-04-2009 |
20090150654 | FUSED MULTIPLY-ADD FUNCTIONAL UNIT - A functional unit is added to a graphics processor to provide direct support for double-precision arithmetic, in addition to the single-precision functional units used for rendering. The double-precision functional unit can execute a number of different operations, including fused multiply-add, on double-precision inputs using data paths and/or logic circuits that are at least double-precision width. The double-precision and single-precision functional units can be controlled by a shared instruction issue circuit, and the number of copies of the double-precision functional unit included in a core can be less than the number of copies of the single-precision functional units, thereby reducing the effect of adding support for double-precision on chip area. | 06-11-2009 |
20090182990 | Method and Apparatus for a Pipelined Multiple Operand Minimum and Maximum Function - Embodiments of the invention provide methods and apparatus for executing a multiple operand minimum or maximum instructions. Executing the multiple operand minimum or maximum instruction comprises transferring more than two operands to one or more processing lanes of a vector unit. A first compare operation may be performed in at least one processing lane of the vector unit to determine a greater or smaller of a first operand and a second operand. The greater (or smaller) operand may be transferred to a dot product unit, wherein, in a second compare operation, the transferred operand is compared to at least a third operand to determine one of the greater and smaller of the more than two operands. | 07-16-2009 |
20090198973 | PROCESSING CIRCUIT - A processing circuit according to the present invention includes a plurality of logic circuits (designated as L | 08-06-2009 |
20090217008 | Program conversion device, and secret keeping program - Provided is a program conversion apparatus for generating a secret holding program, which disables a malicious analyzer from analyzing the an original program easily. | 08-27-2009 |
20090240925 | DEVICE, METHOD, AND COMPUTER PROGRAM PRODUCT THAT PROCESS MESSAGE - A first arithmetic unit performs a network process for transmission and reception of a message. A second arithmetic unit performs a network process and a specific process that is predetermined to be performed on the message in relation with the network process. An alternate process management table stores therein process information in which associated identification information with an instruction sequence, the identification information being information for identifying a type of the message, the instruction sequence being a sequence for sequentially performing a network process and a specific process. The first arithmetic unit includes an identification information detector that detects the identification information from the message, and a controller that retrieves, from the alternate process management table, an instruction sequence corresponding to the identification information detected, so as to control the second arithmetic unit to perform the instruction sequence retrieved. | 09-24-2009 |
20090240926 | ARITHMETIC OPERATING APPARATUS AND METHOD FOR PERFORMING ARITHMETIC OPERATION - A technique realizes execution of various combinations of arithmetic operations in, for example SIMD floating-point multiply-add arithmetic operation, with less instruction kind codes. An arithmetic operating apparatus includes a setting unit that sets in one or more unused bits of a single instruction extended instruction information to instruct at least one of a register and arithmetic operators to perform an extended process different from an ordinary process. | 09-24-2009 |
20090249039 | Providing Extended Precision in SIMD Vector Arithmetic Operations - The present invention provides extended precision in SIMD arithmetic operations in a processor having a register file and an accumulator. A first set of data elements and a second set of data elements are loaded into first and second vector registers, respectively. Each data element comprises N bits. Next, an arithmetic instruction is fetched from memory. The arithmetic instruction is decoded. Then, the first vector register and the second vector register are read from the register file. The present invention executes the arithmetic instruction on corresponding data elements in the first and second vector registers. The resulting element of the execution is then written into the accumulator. Then, the resulting element is transformed into an N-bit width element and written into a third register for further operation or storage in memory. The transformation of the resulting element can include, for example, rounding, clamping, and/or shifting the element. | 10-01-2009 |
20090282223 | DATA PROCESSING CIRCUIT - Provided is a data processing circuit. A control unit outputs an operation control signal and a memory control signal. A plurality of program memories each outputs a command in response to the memory control signal. A plurality of arithmetic sections each selectively performs any one of the commands from the plurality of program memories in response to the operation control signal. Operation modes of the data processing circuit can be flexibly changed according to operation environments. | 11-12-2009 |
20090282224 | CLIPPING A KNOWN RANGE OF INTEGER VALUES USING DESIRED CEILING AND FLOOR VALUES - An aspect of the present invention clips a sequence of data values within a known range (defined by a set of integer values) by a ceiling value and a floor value. In an embodiment, such a feature is obtained by first storing in each of a sequence of memory locations a respective value corresponding to each integer value, with a stored value in a memory location equaling the floor value if the memory location corresponds to an integer having a value less than the floor value, equaling the ceiling value if the memory location corresponds to an integer having a value greater than the ceiling value, and equaling the value of the corresponding integer otherwise. When a sequence of data values are thereafter received for clipping, the clipped value for each data value is obtained by merely retrieving a corresponding stored value from the corresponding location. | 11-12-2009 |
20090300335 | Execution Unit With Inline Pseudorandom Number Generator - A circuit arrangement and method couple a hardware-based pseudorandom number generator (PRNG) to an execution unit in such a manner that pseudorandom numbers generated by the PRNG may be selectively output to the execution unit for use as an operand during the execution of instructions by the execution unit. A PRNG may be coupled to an input of an operand multiplexer that outputs to an operand input of an execution unit so that operands provided by instructions supplied to the execution unit are selectively overridden with pseudorandom numbers generated by the PRNG. Furthermore, overridden operands provided by instructions supplied to the execution unit may be used as seed values for the PRNG. In many instances, an instruction executed by an execution unit may be able to perform an arithmetic operation using both an operand specified by the instruction and a pseudorandom number generated by the PRNG during the execution of the instruction, so that the generation of the pseudorandom number and the performance of the arithmetic operation occur during a single pass of an execution unit. | 12-03-2009 |
20090300336 | Microprocessor with highly configurable pipeline and executional unit internal hierarchal structures, optimizable for different types of computational functions - The invention resides in a flexible data pipeline structure for accommodating software computational instructions for varying application programs and having a programmable embedded processor with internal pipeline stages the order and length of which varies as fast as every clock cycle based on the instruction sequence in an application program preloaded into the processor, and wherein the processor includes a data switch matrix selectively and flexibly interconnecting pluralities of mathematical execution units and memory units in response to said instructions, and wherein the execution units are configurable to perform operations at different precisions of multi-bit arithmetic and logic operations and in a multi-level hierarchical architecture structure. | 12-03-2009 |
20090313458 | METHOD AND APPARATUS FOR VECTOR EXECUTION ON A SCALAR MACHINE - A processor that can execute instructions in either scalar mode or vector mode. In scalar mode, instructions are executed once per fetch. In vector mode, instructions are executed multiple times per fetch. In vector mode, the processor recognizes scalar variables and vector variables. Scalar variables may be assigned a fixed memory location. Vector variables use different physical locations at different iterations of the same instruction. The processor includes circuitry to automatically index addresses of vector variables for each iteration of the same instruction. This circuitry partitions a register into a vector region and a scalar region. Accesses to the vector region are automatically indexed based on the number of iterations of the instruction that have been performed. | 12-17-2009 |
20090327664 | Arithmetic processing apparatus - An arithmetic processing apparatus includes an operation circuit group that performs encryption and a redundant operation circuit group configured the same as the operation circuit group. The arithmetic processing apparatus, while performing encryption, performs normal encryption in the operation circuit group, and performs an encryption mask processing program by using data and the like randomly generated by a random data generating unit and the like in the redundant operation circuit group. The arithmetic processing apparatus, when not performing encryption, performs normal arithmetic processing in the redundant operation circuit group. | 12-31-2009 |
20100023733 | Microprocessor Extended Instruction Set Precision Mode - A method and apparatus to gain additional functionality of a microprocessor by adding an extended instruction set mode. In this mode, the result of executing an instruction may be changed without changing the instruction itself. In the extended instruction set mode, there is an increase to the number of bits of precision when executing the plus instruction. An additional bit position is added to the program counter register. When this bit is set, the microprocessor is in extended instruction set mode. In addition, a new one bit latch is provided. The latch may be changed only when the microprocessor is in extended instruction set mode. The latch is defined as holding a true carry bit. A significant bit of a register holding a sum is saved in the carry latch at the end of the plus instruction. | 01-28-2010 |
20100042814 | EXTENDED INSTRUCTION SET ARCHITECTURES - An instruction set architecture includes a definition set of extended real values (e.g., computations or values that typically produce an IEEE NaN result) and a rules set of extended real value rules specifying values for one or more functions of one or more extended real values. Operations are performed on extended real values based at least partially on the extended real value rules. The instruction set architecture can be used, for example, to facilitate continued operations in a computer in case of errors relating to computations on or resulting in undefined values. | 02-18-2010 |
20100082949 | APPARATUS, COMPUTER PROGRAM PRODUCT AND ASSOCIATED METHODOLOGY FOR VIDEO ANALYTICS - A processor and associated methodology employ a SIMD architecture and instruction set to efficiently perform video analytics operation on images. The processor contains a group of SIMD instructions used by the method to implement video analytic filters that avoid bit expansion of the pixels to be filtered. The filters hold the number of bits representing a pixel constant throughout the entire operation, conserving processor capacity and throughput when performing video analytics. | 04-01-2010 |
20100088493 | IMAGE PROCESSING DEVICE AND DATA PROCESSOR - A restriction is given to the calculation function for image processing achieved by the hard-wired system and the memory access control of a buffer memory, and a range of the restriction is made variable by a program control and others. Data is inputted to the buffer memory from the outside with a restriction of “in units of memory line”, and the number of memory lines and positions of the same to which data is inputted can be programmable by the control circuit. The arithmetic circuit is subjected to the restriction of performing the calculation in units of data of one or plural memory lines supplied from the buffer memory, and a calculation processing content in units of calculation processing for the units of data can be programmably assigned by the control circuit. | 04-08-2010 |
20100106947 | PROCESSOR EXPLOITING TRIVIAL ARITHMETIC OPERATIONS - The present application relates to the field of processors and in particular to the carrying out of arithmetic operations. Many of the computations performed by processors consist of a large number of simple operations. As a result, a multiplication operation may take a significant number of clock cycles to complete. The present application provides a processor having a trivial operand register, which is used in the carrying out of arithmetic or storage operations for data values stored in a data store. | 04-29-2010 |
20100115245 | DETECTING AND RECOVERING FROM TIMING VIOLATIONS OF A PROCESSOR - A system for detecting and correcting invalid calculation results due to a timing violation. A processor compares results of an instruction simultaneously executed by a first arithmetic pipeline and a second arithmetic pipeline of the processor. In the second arithmetic pipeline, the critical stage of the first arithmetic pipeline is divided to multiple stages. A first result calculated by the first arithmetic pipeline is speculatively executed within the processor. The second arithmetic pipeline calculates a second result. The processor compares the second result to the first result. When the results are identical, the first result is assigned as the final result with a complete status. When the results do not match, the processor replaces the first result with the second result. The processor may then cancel the speculatively executed instruction and issue the second result as a final result. The processor may then restart subsequent instructions using the second result. | 05-06-2010 |
20100185836 | ARITHMETIC PROGRAM CONVERSION APPARATUS, ARITHMETIC PROGRAM CONVERSION METHOD, AND PROGRAM - An arithmetic-program conversion apparatus includes: a program storage section storing an arithmetic program describing a circuit by a logical expression including a plurality of input and output variables, and operators; if the expression has three input variables or more, an intermediate-variable generation section generating an intermediate variable for converting the expression into a plurality of binomials including input and output variables; if the intermediate variable is generated, an expression conversion section converting the logical expression into a plurality of binomials including a binomial for obtaining the intermediate variable and a binomial obtaining the output variable from the intermediate variable; if a plurality of binomials are generated, an expression update section updating the stored original expression; a bit-width determination section determining bit widths of the output, input, and intermediate variables of the expression; and a bit-width storage section storing the bit widths of the output, input, and intermediate variables. | 07-22-2010 |
20100274996 | MICRO-PROCESSOR - A micro-processor includes a clock generator configured to generate a fetch clock, a decoding clock, an execution clock, and a write-back clock that are sequentially enabled; a volatile memory device configured to output pre-stored program data in response to the fetch clock; a command decoder configured to decode the program data in response to the decoding clock and generate a decoding command; an arithmetic device configured to perform an arithmetic operation according to the command of the decoding command in response to the execution clock; and a peripheral circuit device configured to be operated according to the command of the decoding command in response to the write-back clock. | 10-28-2010 |
20100299505 | INSTRUCTION FUSION CALCULATION DEVICE AND METHOD FOR INSTRUCTION FUSION CALCULATION - An instruction fusion calculation device of the present invention includes an instruction fusion detection circuit, an instruction fusion circuit, and a calculator. The instruction fusion detection circuit determines whether or not a fusion of a preceding instruction and a subsequent instruction that have a flow dependence relationship between them can be made. The instruction fusion circuit fuses the preceding instruction and the subsequent instruction to which it is determined by the instruction fusion detection circuit that the instructions can be fused into one instruction. The calculator executes the fused instruction into which the instructions are fused by the instruction fusion circuit to output the calculation result and outputs at least one of the calculation results obtained by executing the preceding instruction and the subsequent instruction as an intermediate result. | 11-25-2010 |
20100312999 | INTERNAL PROCESSOR BUFFER - One or more of the present techniques provide a compute engine buffer configured to maneuver data and increase the efficiency of a compute engine. One such compute engine buffer is connected to a compute engine which performs operations on operands retrieved from the buffer, and stores results of the operations to the buffer. Such a compute engine buffer includes a compute buffer having storage units which may be electrically connected or isolated, based on the size of the operands to be stored and the configuration of the compute engine. The compute engine buffer further includes a data buffer, which may be a simple buffer. Operands may be copied to the data buffer before being copied to the compute buffer, which may save additional clock cycles for the compute engine, further increasing the compute engine efficiency. | 12-09-2010 |
20100313000 | CONDITIONAL OPERATION IN AN INTERNAL PROCESSOR OF A MEMORY DEVICE - The present techniques provide an internal processor of a memory device configured to selectively execute instructions in parallel, for example. One such internal processor includes a plurality of arithmetic logic units (ALUs), each connected to conditional masking logic, and each configured to process conditional instructions. A condition instruction may be received by a sequencer of the memory device. Once the condition instruction is received, the sequencer may enable the conditional masking logic of the ALUs. The sequencer may toggle a signal to the conditional masking logic such that the masking logic masks certain instructions if a condition of the condition instruction has been met, and masks other instructions if the condition has not been met. In one embodiment, each ALU in the internal processor may selectively perform instructions in parallel. | 12-09-2010 |
20100318771 | COMBINED BYTE-PERMUTE AND BIT SHIFT UNIT - A processor includes a decode unit and a byte permute unit. The byte permute unit receives an instruction from the decode unit. The byte permute unit determines whether the instruction corresponds to a shuffle instruction or a shift instruction. For a shuffle instruction, the byte permute unit uses a byte shuffler to perform a shuffle operation indicated by the instruction. For a shift instruction that indicates a shift magnitude, the byte permute unit uses the byte shuffler to byte-level shift a source operand corresponding to the instruction by an integer number of bytes. The byte permute unit also generates a sequence of output bits by bit-shifting the byte-level shifted source operand by a number of bits such that the sum of the number of bits and the integer number of bytes is equal to the shift magnitude. | 12-16-2010 |
20100325396 | Multithread processor and register control method - The present invention relates to a multithread processor, and this multithread processor comprises a plurality of register windows each provided for each of threads and capable of storing data to be used for instruction processing in an arithmetic unit, a work register capable of mutually transferring data with respect to the plurality of register windows and the arithmetic unit and a multithread control unit for controlling data transfer among the plurality of register windows, the work register and the arithmetic unit on the basis of an execution thread identifier for identifying the thread to be executed in the arithmetic unit. This enables conducting the multithread processing at a high speed. | 12-23-2010 |
20110078420 | METHOD FOR ADAPTING AND EXECUTING A COMPUTER PROGRAM AND COMPUTER ARCHITECTURE THEREFORE - A computer architecture ( | 03-31-2011 |
20110099356 | DEVICE FOR CORRECTING SET-POINT SIGNALS AND SYSTEM FOR THE GENERATION OF GRADIENTS COMPRISING SUCH A DEVICE - A device for real-time correction of set-point signals intended to receive at the input set-point signals and to deliver at its output set-point signals that are modified to compensate for defects, negative effects or the like subsequently encountered during the processing and/or the application of the set-point signals. This device ( | 04-28-2011 |
20110125988 | ARITHMETIC PROCESSING UNIT - In an arithmetic processing unit adopting register windows, a configuration is made such that the reading process of a register file is controlled by two stages of a current window selection and a register selection, and the register selected at a plurality of reading ports of the register is set to each port in advance such that it will be out-of-order executable. Accordingly, the process of reading the data into an arithmetic section is possible without having a temporary memory, and an instruction subsequent to a window switching instruction is also out-of-order executable. | 05-26-2011 |
20110131395 | METHOD AND PROCESSOR UNIT FOR IMPLEMENTING A CHARACTERISTIC-2-MULTIPLICATION - The method for implementing a characteristic-2-multiplication of at least two input bit strings each having a number N of bits by means of a processor unit suitable for carrying out an integer multiplication, having the following steps:
| 06-02-2011 |
20110153993 | Add Instructions to Add Three Source Operands - A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium. | 06-23-2011 |
20110153994 | Multiplication Instruction for Which Execution Completes Without Writing a Carry Flag - A method in one aspect may include receiving a multiply instruction. The multiply instruction may indicate a first source operand and a second source operand. A product of the first and second source operands may be stored in one or more destination operands indicated by the multiply instruction. Execution of the multiply instruction may complete without writing a carry flag. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium. | 06-23-2011 |
20110153995 | ARITHMETIC APPARATUS INCLUDING MULTIPLICATION AND ACCUMULATION, AND DSP STRUCTURE AND FILTERING METHOD USING THE SAME - Disclosed are an arithmetic apparatus including MAC calculation, and a DSP structure and a filtering method using the same. The arithmetic apparatus includes: first and second registers storing one or more pieces of n-bit data (n is a natural number); a third register storing one or more pieces of 2n bit data; a multiplier having a first input terminal connected to the first register, a second input terminal connected to the second and third registers, and multiplying an input value of the first input terminal and that of the second input terminal; and an arithmetic-logic unit (ALU) having a first input terminal connected to an output terminal of the multiplier and a second input terminal feedback-connected to an output terminal, adding an input value of the first terminal and that of the second terminal, and having the output terminal connected to the third register. | 06-23-2011 |
20110161638 | Ising Systems: Helical Band Geometry For DTC and Integration of DTC Into A Universal Quantum Computational Protocol - Disclosed herein are efficient geometries for dynamical topology changing (DTC), together with protocols to incorporate DTC into quantum computation. Given an Ising system, twisted depletion to implement a logical gate T, anyonic state teleportation into and out of the topology altering structure, and certain geometries of the (1,−2)-bands, a classical computer can be enabled to implement a quantum algorithm. | 06-30-2011 |
20110208951 | INSTRUCTION PROCESSOR AND METHOD THEREFOR - A method of executing a program instruction is disclosed. An instruction operand stored at a register of a register file is accessed by an execution unit using multiple access requests. A first portion of the execution unit provides a first access request to a first access port of the register file to access a first portion of the instruction operand. A second portion of the execution unit provides a second access request to a second access port of the register file to access a second portion of the instruction operand. The register file can be configured into physically separate portions. | 08-25-2011 |
20110208952 | PROGRAMMABLE CONTROLLER FOR EXECUTING A PLURALITY OF INDEPENDENT SEQUENCE PROGRAMS IN PARALLEL - A programmable controller which executes a plurality of independent sequence programs in parallel is provided with an ASIC, including a plurality of arithmetic-logic units and a plurality of arbitration circuits, and MPUs as many as the arbitration circuits. The entire execution time of the programmable controller is shortened by changing combinations (groups of arithmetic-logic units) of the MPUs (and the arbitration circuits as many as the MPUs) and the arithmetic-logic units, based on the ratios of MPU execution instructions and ASIC execution instructions included in those instructions which constitute the programs to be executed in parallel. | 08-25-2011 |
20110238956 | Collective Acceleration Unit Tree Structure - A mechanism is provided in a collective acceleration unit for performing a collective operation to distribute or collect data among a plurality of participant nodes. The mechanism receives an input collective packet for a collective operation from a neighbor node within a collective tree. The input collective packet comprises a tree identifier and an input data field and wherein the collective tree comprises a plurality of sub trees. The mechanism maps the tree identifier to an index within the collective acceleration unit. The index identifies a portion of resources within the collective acceleration unit and is associated with a set of neighbor nodes in a given sub tree within the collective tree. For each neighbor node the collective acceleration unit stores destination information. The collective acceleration unit performs an operation on the input data field using the portion of resources to effect the collective operation. | 09-29-2011 |
20110238957 | SOFTWARE CONVERSION PROGRAM PRODUCT AND COMPUTER SYSTEM - According to one embodiment, a software conversion program product having a computer readable medium including programmed instructions, wherein the instructions, when executed by a computer system including a host processor and one or more accelerator processors, causes the computer system to perform: analyzing input software and obtaining a compute intensity calculated by dividing the number of arithmetic processing times in a loop by the size of data accessed in the loop and a data reference area size that is a total size of areas where data is referred to; determining a processor that executes loops on the basis of obtained values and a preliminarily prepared win-loss table in which wins and losses of execution times between the host processor and the accelerator processor are defined; and converting the input software so that the determined processor executes the loops. | 09-29-2011 |
20110238958 | DATA PROCESSING DEVICE - A data processing device has an instruction decoder, a control logic unit, and ALU. The instruction decoder decodes instruction codes of an arithmetic instruction. The control logic unit detects the effective data width of operation data to be processed according to the decode result from the instruction decoder and determines the number of cycles for the instruction execution corresponding to the effective, data width. The ALU executes the instruction with the number of cycles of the instruction execution determined by the control logic unit. | 09-29-2011 |
20110264896 | MICROPROCESSOR THAT FUSES MOV/ALU INSTRUCTIONS - A microprocessor receives first and second program-adjacent macroinstructions of the instruction set architecture of the microprocessor. The first macroinstruction instructs the microprocessor to move a first operand to a first architectural register from a second architectural register. The second macroinstruction instructs the microprocessor to perform an arithmetic/logic operation using the first operand in the second architectural register and a second operand in a third architectural register to generate a result and to load the result back into the first architectural register. An instruction translator simultaneously translates the first and second program-adjacent macroinstructions into a single micro-operation for execution by an execution unit. The single micro-operation instructs the execution unit to perform the arithmetic/logic operation using the first operand in the second architectural register and the second operand in third architectural register to generate the result and to load the result back into the first architectural register. | 10-27-2011 |
20110302393 | CONTROL SYSTEMS AND DATA PROCESSING METHOD - When executing sequential processing such as a ladder logic, converting a program formed of an instruction set of another processor to a program executable by an own processor in software and then conducting processing lowers the real time property. In a control system, a storage unit stores a program for the own processor and a program for another processor. A processor reads data from the storage unit, executes processing described as a program, and gives an instruction to change over a method for acquiring data from the storage unit, to a conversion instruction unit according to data contents. A changeover unit is connected to the storage unit directly or via the conversion unit to change over the data acquiring method according to an instruction from the conversion instruction unit. The conversion unit converts data read from the storage unit to data executable by the processor, according to a conversion scheme. | 12-08-2011 |
20110314263 | INSTRUCTIONS FOR PERFORMING AN OPERATION ON TWO OPERANDS AND SUBSEQUENTLY STORING AN ORIGINAL VALUE OF OPERAND - An arithmetic/logical instruction is executed having interlocked memory operands. when executed obtains a second operand from a location in memory, and saves a temporary copy of the second operand, the execution performs an arithmetic or logical operation based on the second operand and a third operand and stores the result in the memory location of the second operand, and subsequently stores the temporary copy in a first register. | 12-22-2011 |
20120017070 | COMPILE SYSTEM, COMPILE METHOD, AND STORAGE MEDIUM STORING COMPILE PROGRAM - To provide a compile system, a compile method, and a compile program capable of improving the execution speed of a program. A compile system according to the present invention includes a primary arithmetic unit | 01-19-2012 |
20120047354 | INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING METHOD - According to one embodiment, an information processing apparatus includes: a first verification section configured to perform true-false determination for a predetermined verification target using a verification item obtained by combining specified one of plural verification libraries respectively defining plural verification matters and an affirmative operator or a negative operator; a second verification section configured to subject, concerning the verification target, a true-false determination result of each of a plurality of the verification items verified by the first verification section to an arithmetic operation using a predetermined logical operator and obtain one arithmetic operation result; and an output section configured to output the arithmetic operation result of the second verification section. | 02-23-2012 |
20120060019 | REDUCTION OPERATION DEVICE, A PROCESSOR, AND A COMPUTER SYSTEM - A reduction operation device detects a non-correspondence of an operation type or a data type in a reduction arithmetic operation of a parallel processing. The reduction operation device is inputted a plurality of the synchronization signals and data, sets each transmission destinations of the plurality of inputted synchronization signals and the plurality of data corresponding to a next stage of a reduction operation and executes the reduction operation. The synchronization unit in the reduction operation device detects the non-correspondence between the operation type or the data type included in an instruction of the reduction operation after the synchronization is established and controls the arithmetic operation of the arithmetic unit. | 03-08-2012 |
20120072703 | SPLIT PATH MULTIPLY ACCUMULATE UNIT - In one embodiment, a processor includes a multiply-accumulate (MAC) unit having a first path to handle execution of an instruction if a difference between at least a portion of first and second operands and a third operand is less than a threshold value, and a second path to handle the instruction execution if the difference is greater than the threshold value. Based on the difference, at least part of the third operand is to be provided to a multiplier of the MAC unit or to a compressor of the second path. Other embodiments are described and claimed. | 03-22-2012 |
20120079250 | FUNCTIONAL UNIT CAPABLE OF EXECUTING APPROXIMATIONS OF FUNCTIONS - A semiconductor chip is described having a functional unit that can execute a first instruction and execute a second instruction. The first instruction is an instruction that multiplies two operands. The second instruction is an instruction that approximates a function according to C0+C1X2+C2X2 | 03-29-2012 |
20120079251 | MULTIPLY ADD FUNCTIONAL UNIT CAPABLE OF EXECUTING SCALE, ROUND, GETEXP, ROUND, GETMANT, REDUCE, RANGE AND CLASS INSTRUCTIONS - A method is described that involves executing a first instruction with a functional unit. The first instruction is a multiply-add instruction. The method further includes executing a second instruction with the functional unit. The second instruction is a round instruction. | 03-29-2012 |
20120166773 | HASH PROCESSING USING A PROCESSOR - In certain embodiments, a digital signal processor (DSP) has multiple arithmetic logic units and a register module. The DSP is adapted to generate a message digest H from a message M in accordance with the SHA-1 standard, where M includes N blocks M | 06-28-2012 |
20120198211 | ARITHMETIC UNIT AND ARITHMETIC PROCESSING METHOD FOR OPERATING WITH HIGHER AND LOWER CLOCK FREQUENCIES - There is a need for providing a battery-less integrated circuit (IC) card capable of operating in accordance with a contact usage or a non-contact usage, preventing coprocessor throughput from degrading despite a decreased clock frequency for reduced power consumption under non-contact usage, and ensuring high-speed processing under non-contact usage. A dual interface card is a battery-less IC card capable of operating in accordance with a contact usage or a non-contact usage. The dual interface card operates at a high clock under contact usage and at a low clock under non-contact usage. A targeted operation comprises a plurality of different basic operations. The dual interface card comprises a basic arithmetic circuit group. Under the contact usage, the basic arithmetic circuit group performs one basic operation of the targeted operation at one cycle. Under the non-contact usage, the basic arithmetic circuit group sequentially performs at least two basic operations of the targeted operation at one cycle. | 08-02-2012 |
20120198212 | Microprocessor and Method for Enhanced Precision Sum-of-Products Calculation on a Microprocessor - A microprocessor, a method for enhanced precision sum-of-products calculation and a video decoding device are provided, in which at least one general-purpose-register is arranged to provide a number of destination bits to a multiply unit, and a control unit is adapted to provide at least a multiply-high instruction and a multiply-high-and-accumulate instruction to the multiply unit. The multiply unit is arranged to receive at least first and second source operands having an associated number of source bits, a sum of source bits exceeding the number of destination bits, connected to a register-extension cache comprising at least one cache entry arranged to store a number of precision-enhancement bits, and adapted to store a destination portion of a result operand in the general-purpose-register and a precision enhancement portion in the cache entry. The result operand is generated by a multiply-high operation or by a multiply-high-and-accumulate operation, depending on the received instructions. | 08-02-2012 |
20120204012 | CONFIGURABLE PIPELINE BASED ON ERROR DETECTION MODE IN A DATA PROCESSING SYSTEM - A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected. | 08-09-2012 |
20120210101 | COMPETITION TESTING DEVICE - A competition testing apparatus for testing an access competition of an arithmetic unit includes a memory that stores a program, a first processor that executes the program by accessing the memory, a second processor that executes the program by accessing the memory, and an arbitration unit that arbitrates accessing the first processor and the second processor and reports a result of the arbitration upon the first processor and the second processor accessing the same address space in the memory, wherein the memory stores a odd number of programs, further comprises a controller that controls the first processor to process the plurality of test programs stored in the storage in predetermined order, and controls the second processor to process the plurality of test programs stored in the storage in order reverse to the predetermined order, and a recording unit that records the result of arbitration performed using the arbitrator. | 08-16-2012 |
20120260073 | EMULATION OF EXECUTION MODE BANKED REGISTERS - A microprocessor includes processor modes comprising a user mode and a plurality of exception modes. An execution unit performs arithmetic operations on operands specified by program instructions. A first set of storage elements holds a first subset of the operands and provides them to the execution unit coupled thereto. A second set of storage elements associated with each of the modes hold a second subset of the operands and are incapable of directly providing the second operand subset to the execution unit. To enter a new mode from a current mode, logic saves the first operand subset held in the first set of storage elements to the second set of storage elements associated with the current mode and restores to the first set of storage elements the second operand subset held in the second set of storage elements associated with the new mode. | 10-11-2012 |
20120311305 | INFORMATION PROCESSING DEVICE - Provided is an information processing device including an instruction cache, a data cache, first and second arithmetic unit groups including a plurality of arithmetic units capable of parallel operation, a first arithmetic-control circuit that generates one or more operation instructions for the first arithmetic unit group, and a second arithmetic-control circuit that generates one or more operation instructions for the second arithmetic unit group based on an instruction code of a fixed instruction register. The first arithmetic unit group sets the instruction code to the fixed instruction register according to an operation instruction generated based on a first specific instruction code by the first arithmetic-control circuit, and provides data to the second arithmetic unit group according to an operation instruction generated based on a second specific instruction code by the first arithmetic-control circuit. The second arithmetic unit group repeats operations based on the operation instructions by the second arithmetic-control circuit. | 12-06-2012 |
20130007421 | Methods and Apparatus for Efficient Complex Long Multiplication and Covariance Matrix Implementation - Efficient computation of complex long multiplication results and an efficient calculation of a covariance matrix are described. A parallel array VLIW digital signal processor is employed along with specialized complex long multiplication instructions and communication operations between the processing elements which are overlapped with computation to provide very high performance operation. Successive iterations of a loop of tightly packed VLIWs may be used allowing the complex multiplication pipeline hardware to be efficiently used. | 01-03-2013 |
20130024667 | ARITHMETIC AND CONTROL UNIT, ARITHMETHIC AND CONTROL METHOD, PROGRAM AND PARALLEL PROCESSOR - An attribute group storage unit acquires and holds attribute groups set to respective data blocks. A scenario determination unit determines respective transfer systems of the respective blocks between a memory of the lowest hierarchy and a memory of another hierarchy based on those attribute groups and a configuration of an arithmetic unit which is the parallel processor, and controls the transfer of the respective data blocks according to the determined transfer systems, and the parallel arithmetic operation corresponding to the transfer. Each of the attribute groups is necessary to determine the transfer systems, and includes one or more attributes not depending on the configuration of the parallel processor. The attribute groups of the write blocks are set assuming that each of the write blocks has already been located in the memory of another hierarchy, and is transferred to the memory of the lowest hierarchy. | 01-24-2013 |
20130024668 | ARCHITECTURE AND IMPLEMENTATION METHOD OF PROGRAMMABLE ARITHMETIC CONTROLLER FOR CRYPTOGRAPHIC APPLICATIONS - An architecture includes a controller. The controller is configured to receive a microprogram. The microprogram is configured for performing at least one of hierarchical or a sequence of polynomial computations. The architecture also includes an arithmetic logic unit (ALU) communicably coupled to the controller. The ALU is controlled by the controller. Additionally, the microprogram is compiled prior to execution by the controller, the microprogram is compiled into a plurality of binary tables, and the microprogram is programmed in a command language in which each command includes a first portion for indicating at least one of a command or data transferred to the ALU, and a second portion for including a control command to the controller. The architecture and implementation of the programmable controller may be for cryptographic applications, including those related to public key cryptography. | 01-24-2013 |
20130086366 | Register File with Embedded Shift and Parallel Write Capability - An apparatus includes a register file including a logical circuit. The register file is configured to perform one or more logical operations in conjunction with the logical circuit. The logical operation is performed in response to the register file receiving a register file control instruction. The register file control instruction is independent from an arithmetic logic unit (ALU) control instruction and a multiply-and-accumulate unit (MACU) control instruction. | 04-04-2013 |
20130151820 | METHOD AND APPARATUS FOR ROTATING AND SHIFTING DATA DURING AN EXECUTION PIPELINE CYCLE OF A PROCESSOR - A method and apparatus are described for processing data during an execution pipeline cycle of a processor. Valid bits of the data are generated according to a designated data size. Each of the valid bits is inserted into at least one of a plurality of bit positions. The valid bits are rotated in a predetermined direction (i.e., left or right rotation) by a designated number of bit positions. Valid bits are removed from a portion of the plurality of bit positions after being rotated. Zeros or most significant bits (MSBs) of the data may be inserted in the bit positions from which the valid bits were removed. The number of bit positions to rotate the valid bits by may be designated by a first bit subset and a second bit subset. The first bit subset may indicate a number of bytes, and the second bit subset may indicate a number of bits. | 06-13-2013 |
20130166889 | METHOD AND APPARATUS FOR GENERATING FLAGS FOR A PROCESSOR - A method and apparatus are described for generating flags in response to processing data during an execution pipeline cycle of a processor. The processor may include a multiplexer configured generate valid bits for received data according to a designated data size, and a logic unit configured to control the generation of flags based on a shift or rotate operation command, the designated data size and information indicating how many bytes and bits to rotate or shift the data by. A carry flag may be used to extend the amount of bits supported by shift and rotate operations. A sign flag may be used to indicate whether a result is a positive or negative number. An overflow flag may be used to indicate that a data overflow exists, whereby there are not a sufficient number of bits to store the data. | 06-27-2013 |
20130166890 | APPARATUS COMPRISING A PLURALITY OF ARITHMETIC LOGIC UNITS - An arrangement of at least two arithmetic logic units carries out an operation defined by a decoded instruction including at least one operand and more than one operation code. The operation codes and at least one operand are received and corresponding executions are performed by the arithmetic logic units on a single clock cycle. The result of the execution from one arithmetic logic unit is used as an operand by a further arithmetic logic unit. The decoding of the instruction is performed in an immediately preceding single clock cycle. | 06-27-2013 |
20130173890 | METHOD OF, AND APPARATUS FOR, STREAM SCHEDULING IN PARALLEL PIPELINED HARDWARE - A method of generating a hardware design for a stream processor. The method includes defining a graph representing a processing operation designating processes to be implemented in hardware as part of the stream processor. The graph represents the processing operation in the time domain as a function of clock cycles and includes at least one data path. At least one stream offset object is provided located at a particular point in the data path. The stream offset object is operable to access, for a particular clock cycle and for the particular point in the data path, data values from a clock cycle different from the particular clock cycle | 07-04-2013 |
20130205123 | Data Processing Device and Method - The present invention relates to a processor having a trace cache and a plurality of ALUs arranged in a matrix, comprising an analyser unit located between the trace cache and the ALUs, wherein the analyser unit analyses the code in the trace cache, detects loops, transforms the code, and issues to the ALUs sections of the code combined to blocks for joint execution for a plurality of clock cycles. | 08-08-2013 |
20130212362 | IMAGE PROCESSING DEVICE AND DATA PROCESSOR - A restriction is given to the calculation function for image processing achieved by the hard-wired system and the memory access control of a buffer memory, and a range of the restriction is made variable by a program control and others. Data is inputted to the buffer memory from the outside with a restriction of “in units of memory line”, and the number of memory lines and positions of the same to which data is inputted can be programmable by the control circuit. The arithmetic circuit is subjected to the restriction of performing the calculation in units of data of one or plural memory lines supplied from the buffer memory, and a calculation processing content in units of calculation processing for the units of data can be programmably assigned by the control circuit. | 08-15-2013 |
20130227252 | Add Instructions to Add Three Source Operands - A method in one aspect may include receiving an add instruction. The add instruction may indicate a first source operand, a second source operand, and a third source operand. A sum of the first, second, and third source operands may be stored as a result of the add instruction. The sum may be stored partly in a destination operand indicated by the add instruction and partly a plurality of flags. Other methods are also disclosed, as are apparatus, systems, and instructions on machine-readable medium. | 08-29-2013 |
20130254516 | ARITHMETIC PROCESSING UNIT - An arithmetic processing unit that performs processing of a stream-type includes an arithmetic unit configured to operate an input operand to obtain a result of operation; and a data input and output unit configured to read the input operand out of a memory when an instruction which is issued in a case where a stream length of the input operand is shorter than a stream length of an output operand corresponding to the input operand and includes data indicating a recursive rule used when the input operand is read out, to supply the read input operand, and to store the result of the operation obtained by the arithmetic unit in the memory as the output operand, wherein the arithmetic unit | 09-26-2013 |
20130262836 | PROCESSOR FOR PERFORMING MULTIPLY-ADD OPERATIONS ON PACKED DATA - A method and apparatus for including in a processor instructions for performing multiply-subtract operations on packed data. In one embodiment, a processor is coupled to a memory. The memory has stored therein a first packed data and a second packed data. The processor performs operations on data elements in said first packed data and said second packed data to generate a third packed data in response to receiving an instruction. At least two of the data elements in this third packed data storing the result of performing multiply-subtract operations on data elements in the first and second packed data. | 10-03-2013 |
20130275726 | ARITHMETIC PROCESSING APPARATUS AND BRANCH PREDICTION METHOD - A branch target address table is provided for each branch instruction having a plurality of branch targets. Each branch target address table stores a history of a plurality of branch target addresses determined in the past by executing a corresponding branch instruction. A branch target prediction unit predicts a predicted branch target address with respect to a branch instruction with reference to the history of branch target addresses stored in the branch target address table corresponding to the branch instruction. The predicted branch target address obtained as a result of the prediction is stored, for example, in a predicted branch target address storage unit in association with the branch instruction, and is referenced by an instruction fetch control unit at the time of prefetching a branch target instruction. | 10-17-2013 |
20130275727 | Processors, Methods, Systems, and Instructions to Generate Sequences of Integers in which Integers in Consecutive Positions Differ by a Constant Integer Stride and Where a Smallest Integer is Offset from Zero by an Integer Offset - A method of an aspect includes receiving an instruction. The instruction indicates an integer stride, indicates an integer offset, and indicates a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes a sequence of at least four integers in numerical order with a smallest one of the at least four integers differing from zero by the integer offset and with all integers of the sequence in consecutive positions differing by the integer stride. Other methods, apparatus, systems, and instructions are disclosed. | 10-17-2013 |
20130275728 | PACKED DATA OPERATION MASK REGISTER ARITHMETIC COMBINATION PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - A method of an aspect includes receiving a packed data operation mask register arithmetic combination instruction. The packed data operation mask register arithmetic combination instruction indicates a first packed data operation mask register, indicates a second packed data operation mask register, and indicates a destination storage location. An arithmetic combination of at least a portion of bits of the first packed data operation mask register and at least a corresponding portion of bits of the second packed data operation mask register is stored in the destination storage location in response to the packed data operation mask register arithmetic combination instruction. Other methods, apparatus, systems, and instructions are disclosed. | 10-17-2013 |
20130275729 | Packed Data Rearrangement Control Indexes Precursors Generation Processors, Methods, Systems, and Instructions - A method of an aspect includes receiving an instruction indicating a destination storage location. A result is stored in the destination storage location in response to the instruction. The result includes the result including a sequence of at least four non-negative integers. In an aspect, values of the at least four non-negative integers are not calculated using a result of a preceding instruction. Other methods, apparatus, systems, and instructions are disclosed. | 10-17-2013 |
20130283016 | SIGNAL PROCESSING CIRCUIT - Provided is a signal processing circuit occupying a small circuit area. A common arithmetic operation element is shared between a plurality of arithmetic operation sequence control units. An arbitration circuit selects, when the plurality of arithmetic operation sequence control units simultaneously generate requests for arithmetic operations to use the common arithmetic operation element, the predetermined sequence control unit based on priority information about the plurality of arithmetic operation sequence control units, causes the common arithmetic operation element to execute the arithmetic operation requested from the selected arithmetic operation sequence control unit, and returns the result of the arithmetic operation to the selected arithmetic operation sequence control unit. | 10-24-2013 |
20130290684 | DATA PACKET ARITHMETIC LOGIC DEVICES AND MEHTODS - New instruction definitions for a packet add (PADD) operation and for a single instruction multiple add (SMAD) operation are disclosed. In addition, a new dedicated PADD logic device that performs the PADD operation in about one to two processor clock cycles is disclosed. Also, a new dedicated SMAD logic device that performs a single instruction multiple data add (SMAD) operation in about one to two clock cycles is disclosed. | 10-31-2013 |
20130297916 | SEMICONDUCTOR DEVICE - A related art semiconductor device suffers from a problem that a processing capacity is decayed by switching an occupied state for each partition. A semiconductor device according to the present invention includes an execution unit that executes an arithmetic instruction, and a scheduler including multiple first setting registers each defining a correspondence relationship between hardware threads and partitions, and generates a thread select signal on the basis of a partition schedule and a thread schedule. The scheduler outputs a thread select signal designating a specific hardware thread without depending on the thread schedule as the partition indicated by a first occupation control signal according to a first occupation control signal output when the execution unit executes a first occupation start instruction. | 11-07-2013 |
20130318329 | CO-PROCESSOR FOR COMPLEX ARITHMETIC PROCESSING, AND PROCESSOR SYSTEM - In order to enable to quickly and efficiently execute, by one system, various modulation/demodulation/synchronous processes in a plurality of radio communication methods, a co-processor ( | 11-28-2013 |
20130339677 | MULTIPLY-AND-ACCUMULATE OPERATION IN AN IMPLANTABLE MICROCONTROLLER - The invention provides microprocessor extensions for cooperating with a sequential arithmetic-logic unit (ALU) to execute a multiply-and-accumulate operation (MAc). The ALU performs a continuous sequence of accumulation instructions synchronously with a clock signal (CLK1). Buffers (BUF1, BUF2) store input data which are fed to a combinatorial multiplier (MULT) by first buses (L | 12-19-2013 |
20130346730 | ARITHMETIC PROCESSING APPARATUS, AND CACHE MEMORY CONTROL DEVICE AND CACHE MEMORY CONTROL METHOD - An arithmetic processing apparatus includes a plurality of processors, each of the processors having an arithmetic unit and a cache memory. The processor includes an instruction port that holds a plurality of instructions accessing data of the cache memory, a first determination unit that validates a first flag when receiving an invalidation request for data in the cache memory, a cache index of a target address and a way ID of the received request match with a cache index of a designated address and a way ID of the load instruction, a second determination unit that validates a second flag when target data is transmitted due to a cache miss, and an instruction re-execution determination unit that instructs re-execution of an instruction subsequent to the load instruction when both the first flag and the second flag are validated at the time of completion of an instruction in the instruction port. | 12-26-2013 |
20140006753 | MATRIX MULTIPLY ACCUMULATE INSTRUCTION | 01-02-2014 |
20140006754 | SYSTEM AND METHOD FOR PERFORMING PREDICATED SELECTION OF AN OUTPUT REGISTER | 01-02-2014 |
20140013086 | ADDITION INSTRUCTIONS WITH INDEPENDENT CARRY CHAINS - A number of addition instructions are provided that have no data dependency between each other. A first addition instruction stores its carry output in a first flag of a flags register without modifying a second flag in the flags register. A second addition instruction stores its carry output in the second flag of the flags register without modifying the first flag in the flags register. | 01-09-2014 |
20140019725 | METHOD FOR FAST LARGE-INTEGER ARITHMETIC ON IA PROCESSORS - Methods, systems, and apparatuses are disclosed for implementing fast large-integer arithmetic within an integrated circuit, such as on IA (Intel Architecture) processors, in which such means include receiving a 512-bit value for squaring, the 512-bit value having eight sub-elements each of 64-bits and performing a 512-bit squaring algorithm by: (i) multiplying every one of the eight sub-elements by itself to yield a square of each of the eight sub-elements, the eight squared sub-elements collectively identified as T1, (ii) multiplying every one of the eight sub-elements by the other remaining seven of the eight sub-elements to yield an asymmetric intermediate result having seven diagonals therein, wherein each of the seven diagonals are of a different length, (iii) reorganizing the asymmetric intermediate result having the seven diagonals therein into a symmetric intermediate result having four diagonals each of 7×1 sub-elements of the 64-bits in length arranged across a plurality of columns, (iv) adding all sub-elements within their respective columns, the added sub-elements collectively identified as T2, and (v) yielding a final 512-bit squared result of the 512-bit value by adding the value of T2 twice with the value of T1 once. Other related embodiments are disclosed. | 01-16-2014 |
20140019726 | PARALLEL ARITHMETIC DEVICE, DATA PROCESSING SYSTEM WITH PARALLEL ARITHMETIC DEVICE, AND DATA PROCESSING PROGRAM - A parallel arithmetic device includes a status management section, a plurality of processor elements, and a plurality of switch elements for determining the relation of coupling of each of the processor elements. Each of the processor elements includes an instruction memory for memorizing a plurality of operation instructions corresponding respectively to a plurality of contexts so that an operation instruction corresponding to the context selected by the status management section is read out, and a plurality of arithmetic units for performing arithmetic processes in parallel on a plurality of sets of input data in a manner compliant with the operation instruction read out from the instruction memory. | 01-16-2014 |
20140019727 | MODIFIED BALANCED THROUGHPUT DATA-PATH ARCHITECTURE FOR SPECIAL CORRELATION APPLICATIONS - Apparatus and method for a modified, balanced throughput data-path architecture is given for efficiently implementing the digital signal processing algorithms of filtering, convolution and correlation in computer hardware, in which both data and coefficient buffers can be implemented as sliding windows. This architecture uses a multiplexer and a data path branch from the Address Generator unit to the multiply-accumulate execution unit. By selecting between the data path of Address Generator to execution unit and the data path of register to execution unit, the unbalanced throughput and multiply-accumulate bubble cycles caused by misaligned addressing on coefficients can be overcome. The modified balanced throughput data-path architecture can achieve a high multiply-accumulate operation rate per cycle in implementing digital signal processing algorithms. | 01-16-2014 |
20140025934 | ARITHMETIC PROCESSING APPARATUS AND METHOD FOR HIGH SPEED PROCESSING OF APPLICATION - An arithmetic processing apparatus and method for high speed processing of an application are provided. The arithmetic processing apparatus may include a program control unit to store operation processing information necessary for application operation in a communication channel by executing an application code, and an operation processing unit to process the application operation using the operation processing information stored in the communication channel. | 01-23-2014 |
20140047220 | Residual Addition for Video Software Techniques - According to some embodiments, a technique provides for the execution of an instruction that includes receiving residual data of a first image and decoded pixels of a second image, zero-extending a plurality of unsigned data operands of the decoded pixels producing a plurality of unpacked data operands, adding a plurality of signed data operands of the residual data to the plurality of unpacked data operands producing a plurality of signed results; and saturating the plurality of signed results producing a plurality of unsigned results. | 02-13-2014 |
20140068231 | CENTRAL PROCESSING UNIT AND ARITHMETIC UNIT - There is a need to provide a central processing unit capable of improving the resistance to power analysis attack without changing programs, lowering clock frequencies, and greatly redesigning a central processing unit of the related art. In a central processing unit, an arithmetic unit is capable of performing arithmetic operation using data irrelevant to data stored in a register group. A control unit allows the arithmetic unit to perform arithmetic processing corresponding to an incorporated instruction. At this time, the control unit allows the arithmetic unit to perform arithmetic processing using the irrelevant data during a first one-clock cycle. | 03-06-2014 |
20140075162 | DIGITAL PROCESSOR HAVING INSTRUCTION SET WITH COMPLEX EXPONENTIAL NON-LINEAR FUNCTION - A digital processor is provided having an instruction set with a complex exponential function. The digital processor evaluates a complex exponential function for an input value, x, by obtaining a complex exponential software instruction having the input value, x, as an input; and in response to the complex exponential software instruction: invoking at least one complex exponential functional unit that implements complex exponential software instructions to apply the complex exponential function to the input value, x; and generating an output corresponding to the complex exponential of the input value, x. A complex exponential function for an input value, x, can be evaluated by wrapping the input value to maintain a given range; computing a coarse approximation angle using a look-up table; scaling the coarse approximation angle to obtain an angle from 0 to θ; and computing a fine corrective value using a polynomial approximation. | 03-13-2014 |
20140082332 | SEMICONDUCTOR INTEGRATED CIRCUIT AND COMPILER - A semiconductor integrated circuit includes: a floating point arithmetic unit that includes circuit resources over which power saving control is performed, and executes a floating point arithmetic operation; a power-control instruction control unit that receives a pre-access instruction corresponding to a floating point arithmetic operation instruction, and invalidates stepwise the power saving control over the circuit resources included in the floating point arithmetic unit to operate a part of the circuit resources in the floating point arithmetic unit; and a control unit that causes the floating point arithmetic unit to execute the floating point arithmetic operation, wherein before execution of the floating point arithmetic operation in the floating point arithmetic unit, power consumption is previously increased by the pre-access instruction. | 03-20-2014 |
20140089643 | INFORMATION PROCESSING APPARATUS AND INSTRUCTION OFFLOADING METHOD - In general, according to one embodiment, an information processing apparatus includes an issuer and a communicator. The issuer issues an offload instruction corresponding to a first process executed in company with a first identifier capable of uniquely specifying a resource of a first arithmetic operation device. The communicator transmits the offload instruction to a second arithmetic operation device and receives a result of execution of the offload instruction from the second arithmetic operation device. In the second arithmetic operation device, the first identifier contained in the offload instruction is converted into a second identifier capable of uniquely specifying a resource of the second arithmetic operation device, and processing specified by the offload instruction is executed. | 03-27-2014 |
20140115303 | DATA PROCESSING DEVICE - A data processing device has an instruction decoder, a control logic unit, and ALU. The instruction decoder decodes instruction codes of an arithmetic instruction. The control logic unit detects the effective data width of operation data to be processed according to the decode result from the instruction decoder and determines the number of cycles for the instruction execution corresponding to the effective, data width. The ALU executes the instruction with the number of cycles of the instruction execution determined by the control logic unit. | 04-24-2014 |
20140129807 | APPROACH FOR EFFICIENT ARITHMETIC OPERATIONS - A system and method are described for providing hints to a processing unit that subsequent operations are likely. Responsively, the processing unit takes steps to prepare for the likely subsequent operations. Where the hints are more likely than not to be correct, the processing unit operates more efficiently. For example, in an embodiment, the processing unit consumes less power. In another embodiment, subsequent operations are performed more quickly because the processing unit is prepared to efficiently handle the subsequent operations. | 05-08-2014 |
20140143524 | INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING APPARATUS CONTROL METHOD, AND A COMPUTER-READABLE STORAGE MEDIUM STORING A CONTROL PROGRAM FOR CONTROLLING AN INFORMATION PROCESSING APPARATUS - An information processing apparatus includes a first arithmetic processing apparatus, a second arithmetic processing apparatus, and a control unit that controls the first arithmetic apparatus and the second arithmetic apparatus, wherein the control unit causes each of the first arithmetic processing apparatus and the second arithmetic processing apparatus to execute a first data processing common to the first and the second arithmetic processing apparatuses, and the control unit causes the second arithmetic processing apparatus to stop the first data processing when the first data processing executed by the first arithmetic processing apparatus is completed earlier than the first data processing executed by the second arithmetic processing apparatus. | 05-22-2014 |
20140149719 | ARITHMETIC PROCESSING APPARATUS, CONTROL METHOD OF ARITHMETIC PROCESSING APPARATUS, AND A COMPUTER-READABLE STORAGE MEDIUM STORING A CONTROL PROGRAM FOR CONTROLLING AN ARITHMETIC PROCESSING APPARATUS - An arithmetic processing apparatus includes a plurality of arithmetic cores configured to execute threads in parallel, and a control unit configured to cause the arithmetic core to execute a reduction operation for data of the threads having the same storage area to which data is written per a predetermined number of threads in order to add data obtained by the reduction operation to data within a corresponding storage area by an atomic process. | 05-29-2014 |
20140189319 | Opportunistic Utilization of Redundant ALU - A processor includes at least one processing core that includes an operation dispatch for dispatching operations from an instruction pipeline, a plurality of arithmetic logic units for executing the operations, a plurality of multiplexers, each of which connects the operation dispatch to a respective arithmetic logic unit, and a controller configured to selectively enable at least one multiplexer to connect the operation dispatch to at least one arithmetic logic unit based on a reliability mode associated with the operation. | 07-03-2014 |
20140201502 | SYSTEMS, APPARATUSES, AND METHODS FOR PERFORMING A BUTTERFLY HORIZONTAL AND CROSS ADD OR SUBSTRACT IN RESPONSE TO A SINGLE INSTRUCTION - Embodiments of systems, apparatuses, and methods for performing in a computer processor vector packed butterfly horizontal cross add or subtract of packed data elements in response to a single vector packed butterfly horizontal cross add or subtract instruction that includes a destination vector register operand, a source vector register operand, an immediate, and an opcode are described. | 07-17-2014 |
20140201503 | PROCESSOR MICRO-ARCHITECTURE FOR COMPUTE, SAVE OR RESTORE MULTIPLE REGISTERS, DEVICES, SYSTEMS, METHODS AND PROCESSES OF MANUFACTURE - An electronic circuit ( | 07-17-2014 |
20140201504 | FUNCTIONAL UNIT CAPABLE OF EXECUTING APPROXIMATIONS OF FUNCTIONS - A semiconductor chip is described having a functional unit that can execute a first instruction and execute a second instruction. The first instruction is an instruction that multiplies two operands. The second instruction is an instruction that approximates a function according to C | 07-17-2014 |
20140223146 | Blank Bit and Processor Instructions Employing the Blank Bit - Reading a value into a register, checking to see if the value is a NULL, and then jumping out of a loop if the value is a NULL is a common task that processors perform. To speed performance of such a task, a novel “blank bit” is added to the flag register of a processor. When a first instruction (arithmetic, logic or load) is executed, the instruction operands are checked to see if any is a NULL character value. Information on the result of the check is stored in the blank bit. Execution of a second instruction uses the information stored in the blank bit to determine whether or not a second operation (for example, a jump) will be performed. By using the first and second instructions in a loop, the number of instructions executed to check for NULLs at the end of strings and arrays is reduced. | 08-07-2014 |
20140237216 | MICROPROCESSOR - A microprocessor according to an aspect of the present invention includes an arithmetic operation unit. The arithmetic operation unit includes: a plurality of arithmetic operation devices arranged in a multi-stage arrangement; a delay device provided to each stage of the arithmetic operation devices excluding a final stage, and configured to delay an arithmetic operation result of the arithmetic operation devices for one cycle; and a selector provided to each stage of the arithmetic operation devices excluding the final stage, and configured to select either the arithmetic operation result of the arithmetic operation devices or the arithmetic operation result delayed for one cycle in the delay device and output the selected result to the arithmetic operation device in a next stage. The microprocessor is configured to collectively process a plurality of arithmetic operations from the arithmetic operation unit by controlling a selecting condition in the selector. | 08-21-2014 |
20140244982 | PERFORMING STENCIL COMPUTATIONS - A method and apparatus for performing stencil computations efficiently are disclosed. In one embodiment, a processor receives an offset, and in response, retrieves a value from a memory via a single instruction, where the retrieving comprises: identifying, based on the offset, one of a plurality of registers of the processor; loading an address stored in the identified register; and retrieving from the memory the value at the address. | 08-28-2014 |
20140258689 | PROCESSOR FOR LARGE GRAPH ALGORITHM COMPUTATIONS AND MATRIX OPERATIONS - A node processor and method for performing matrix operations includes storing, in memory, non-zero matrix elements of a first sparse matrix, non-zero matrix elements of a second sparse matrix, and matrix elements of a sparse results matrix mapped to the node processor. A matrix communications module exchanges with other node processors, non-zero matrix elements of one or more of the first sparse matrix, second sparse matrix, and sparse results matrix. An arithmetic logic unit generates partial results based on the non-zero matrix elements of the first sparse matrix and on the non-zero matrix elements of the second sparse matrix stored in memory. The arithmetic logic unit further generates a final value for each matrix element of the sparse results matrix mapped to the node processor based on the partial results generated by the arithmetic logic unit and on partial results received from the other node processors. | 09-11-2014 |
20140297995 | FAULT-TOLERANT SYSTEM AND FAULT-TOLERANT OPERATING METHOD - A fault-tolerant system including a calculation unit and an output synthesizer is provided. The calculation unit receives a first environmental parameter and input data, wherein the calculation unit further includes a first and a second calculation circuits. The first calculation circuit is arranged to perform a calculation on the input data in response to the first environmental parameter to generate a first calculation result. The second calculation circuit is different from the first calculation circuit, and arranged to perform the calculation on the input data in response to the first environmental parameter to generate a second calculation result. The output synthesizer selects a first and a second set of bits from the first and the second calculation result according to a control signal, and synthesizes the first set of bits and the second set of bits in sequence to generate an adjusted calculation result. | 10-02-2014 |
20140325190 | METHOD FOR IMPROVING EXECUTION PERFORMANCE OF MULTIPLY-ADD INSTRUCTION DURING COMPILING - The present invention relates to a method for improving execution performance of multiply-add instructions during compiling, comprising the following steps of: compiling a source code by a compiler to acquire internal representation; optimizing; generating a machine code on the basis of a target processor, and allocating a physical register to a pseudo-register in the machine code; and improving results of register allocation to multiply-accumulate instructions. The method for improving execution performance of multiply-add instructions during compiling provided by the present invention has the following advantages: the compiler is allowed to realize procedure optimization by acquiring the optimal MAC (multiply-accumulate) instruction use gain. | 10-30-2014 |
20140351563 | ADVANCED PROCESSOR ARCHITECTURE - The present invention relates to a processor core having an execution unit comprising an arrangement of Arithmetic-Logic-Units, wherein the operation mode of the execution unit is switchable between an asynchronous operation of the Arithmetic-Logic-Units and interconnection between the Arithmetic-Logic-Units such that a signal. from the register file crosses the execution unit and is receipt by the register file in one clock cycle; and wherein a pipelined operation mode of at least one of the Arithmetic-Logic-Units and the interconnection between the Arithmetic-Logic-Units such that a signal requires from the register file through the execution unit back to the register file more than one clock cycles. | 11-27-2014 |
20150033000 | Parallel Processing Array of Arithmetic Unit having a Barrier Instruction - A parallel processing array processor has a plurality of arithmetic units and a unit that manages barrier instructions whereby processing of program sequences may be coordinated. The array processor further comprises a hierarchy of assigned units whereby multiple program sequences may be processed in parallel. | 01-29-2015 |
20150039865 | Control Device for Vehicle - If exclusive control is used when carrying out update processing or reference processing to a data buffer in a shared memory among plural arithmetic units, waiting time increases and it is difficult to guarantee a real time property. | 02-05-2015 |
20150052334 | ARITHMETIC PROCESSING DEVICE AND CONTROL METHOD OF ARITHMETIC PROCESSING DEVICE - An arithmetic processing device includes: a first instruction execution unit configured to include plural staging latches and execute a first instruction by a pipeline operation requiring only a single clock for transition of data between first plural staging latches including a staging latch at a final stage from among the plural staging latches, and a multi-cycle operation requiring plural clocks for transition of data between second plural staging latches positioning at a previous stage side than the first plural staging latches from among the plural staging latches; a second instruction execution unit configured to execute a second instruction; and an instruction control unit configured to input the first instruction and the second instruction, issue the first instruction to the first instruction execution unit and issue the second instruction to the second instruction execution unit such that the execution of the first instruction and the second instruction are partly overlapped. | 02-19-2015 |
20150067304 | INFORMATION PROCESSING APPARATUS AND METHOD OF CONTROLLING INFORMATION PROCESSING APPARATUS - An information processing apparatus includes a plurality of arithmetic processing devices, a common timer unit configured to measure time in common among the plurality of arithmetic processing devices, a plurality of individual timer units to measure execution time of a program per plurality of arithmetic processing devices, a comparing unit configured to compare the program execution time of each of the plurality of arithmetic processing devices, the program execution time being measured by the plurality of individual timer units, with time measured by the common timer unit, and a control unit configured to control processing of the plurality of arithmetic processing devices on the basis of a result of the comparison made by the comparing unit. | 03-05-2015 |
20150089204 | DYNAMICALLY RECONFIGURABLE MICROPROCESSOR - A microprocessor includes a plurality of dynamically reconfigurable functional units, a fingerprint, and a fingerprint unit. As the plurality of dynamically reconfigurable functional units execute instructions according to a first configuration setting, the fingerprint unit accumulates information about the instructions according to a mathematical operation to generate a result. The microprocessor also includes a reconfiguration unit that reconfigures the plurality of dynamically reconfigurable functional units to execute instructions according to a second configuration setting in response to an indication that the result matches the fingerprint. | 03-26-2015 |
20150095621 | ARITHMETIC PROCESSING UNIT, AND METHOD OF CONTROLLING ARITHMETIC PROCESSING UNIT - An arithmetic processing unit including a memory controller configured to make variable-length access requests allowing a plurality of lengths to a memory, the memory controller comprising: a plurality of buffers configured to hold the access requests for each of the lengths of the access requests; and an arbitrator configured to select one of access requests stored in the plurality of buffers in accordance with a number of remaining resources of the memory. | 04-02-2015 |
20150095622 | APPARATUS AND METHOD FOR CONTROLLING EXECUTION OF PROCESSES IN A PARALLEL COMPUTING SYSTEM - An apparatus includes an arbiter and a plurality of arithmetic processors, each including an arithmetic circuit and a measuring circuit. The arithmetic circuit executes an arithmetic process, and the measuring circuit measures a progress level indicating a progress of the arithmetic process executed by the arithmetic circuit. Upon receiving access requests to an external device from first arithmetic processors included in the plurality of arithmetic processors, the arbiter arbitrates the access requests, based on a result of comparing the progress levels measured by the measuring circuits of the first arithmetic processors. | 04-02-2015 |
20150106596 | Data Processing System Having Integrated Pipelined Array Data Processor - A data processing system having a data processing core and integrated pipelined array data processor and a buffer for storing list of algorithms for processing by the pipelined array data processor. | 04-16-2015 |
20150121042 | ARITHMETIC DEVICE - According to an embodiment, an arithmetic device includes an arithmetic processing unit, an address generating unit, and a control unit. The arithmetic processing unit performs a plurality of arithmetic processing used in an encryption method. Based on an upper bit of the address of the first piece of data and based on an offset which is a value corresponding to a counter value and which is based on the address of the first piece of data, the address generating unit generates addresses of the memory device. The control unit controls the arithmetic processing unit in such a way that the arithmetic processing is done in a sequence determined in the encryption method, and that specifies an update of the counter value at a timing of modifying the type of data and at a timing of modifying data. | 04-30-2015 |
20150134936 | SINGLE INSTRUCTION MULTIPLE DATA ADD PROCESSORS, METHODS, SYSTEMS, AND INSTRUCTIONS - New instruction definitions for a packet add (PADD) operation and for a single instruction multiple add (SMAD) operation are disclosed. In addition, a new dedicated PADD logic device that performs the PADD operation in about one to two processor clock cycles is disclosed. Also, a new dedicated SMAD logic device that performs a single instruction multiple data add (SMAD) operation in about one to two clock cycles is disclosed. | 05-14-2015 |
20150149746 | ARITHMETIC PROCESSING DEVICE, INFORMATION PROCESSING DEVICE, AND A METHOD OF CONTROLLING THE INFORMATION PROCESSING DEVICE - An arithmetic processing device promotes transmission efficiency between a processor and a memory. The arithmetic processing device has an arithmetic processing unit which issues an instruction accompanying with data which is sent to the memory, a judgment unit which judges whether or not a redundancy degree of the data which is accompanied with the instruction is more than a predetermined value, a compression unit which judges whether or not compress the data based on an waiting time and a compression time when the redundancy degree of the data is more than the predetermined value, and compress the data when judging that performs the compression, and an instruction arbitration unit which transfers the instruction accompanying with the compressed data to the memory when the compression unit performs the compression and transfers the instruction accompanying with the non-compressed data to the memory when the compression unit does not perform the compression. | 05-28-2015 |
20150339220 | METHODS AND APPARATUS TO USE AN ACCESS TRIGGERED COMPUTER ARCHITECTURE - A method for using an access triggered architecture for a computer implemented application is provided. The method receives a set of data at a designated functional block associated with a system memory location; performs an operation at the designated functional block, using the set of data, to generate a result, wherein the operation is performed each time information is received at the designated functional block; and returns the generated result to the system memory location. | 11-26-2015 |
20160077833 | AUTOMATED DECOMPOSITION FOR MIXED INTEGER LINEAR PROGRAMS WITH EMBEDDED NETWORKS REQUIRING MINIMAL SYNTAX - Embodiments include techniques to receive computer-executable query instructions to solve a MILP problem, the query instructions including a first expression conveying an objective function and side constraint that define a master problem of the MILP problem, a second expression conveying a mapping of graph data to a graph, and a third expression conveying a selection of a graph-based algorithm to solve a subproblem of the MILP problem; a subproblem component to replace the third expression with a fourth expression during decomposition of the MILP problem, the fourth expression including instructions to implement the graph-based algorithm to solve the subproblem; and an execution control component to perform iterations of solving the MILP problem that include executing the first expression to derive a solution to the master problem; and executing the fourth expression to derive a solution to the subproblem based on the mapping and the master problem solution. | 03-17-2016 |
20160077837 | SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR IMPLEMENTING LARGE INTEGER OPERATIONS ON A GRAPHICS PROCESSING UNIT - A system, method, and computer program product for generating executable code for performing large integer operations on a parallel processing unit is disclosed. The method includes the steps of compiling a source code linked to a large integer library to generate an executable file and executing the executable file to perform a large integer operation using a parallel processing unit. The large integer library includes functions for processing large integers that are optimized for the parallel processing unit. | 03-17-2016 |
20160092213 | COMPUTER SYSTEM INCLUDING RECONFIGURABLE ARITHMETIC DEVICE WITH NETWORK OF PROCESSOR ELEMENTS - A reconfigurable arithmetic device includes a plurality of processor elements configured to perform first arithmetic processes corresponding to a first type of instruction and second arithmetic processes corresponding to a second type of instruction, a random-access memory (RAM), and a control unit. The first type of instruction is written into the RAM at a first address, data for the first type of instruction is written into the RAM at a second address, and data for the second type of instruction is written into the RAM at a third address. When the first type of instruction is written at the first address, the control unit decodes the first type of instruction and configures the processor elements to perform the first arithmetic processes. When data for the second type of instruction is written at the third address, the control unit configures the processor elements to perform the second arithmetic processes. | 03-31-2016 |
20160092219 | ACCELERATING CONSTANT VALUE GENERATION USING A COMPUTED CONSTANTS TABLE, AND RELATED CIRCUITS, METHODS, AND COMPUTER-READABLE MEDIA - Accelerating constant value generation using a computed constants table, and related circuits, methods, and computer-readable media are disclosed. In one aspect, an instruction processing circuit provides a computed constants table containing one or more entries each comprising an address and a constant value. The instruction processing circuit is configured to detect, in an instruction stream, a constant-generating instruction sequence, and to determine whether an address of the constant-generating instruction sequence is present in an entry of the computed constants table. If the address of the constant-generating instruction sequence is present in the entry of the computed constants table, the instruction processing circuit provides a constant value stored in the entry for execution of at least one dependent instruction on the constant-generating instruction sequence. In this manner, the generation of constant values by a constant-generating instruction sequence may be accelerated, allowing dependent instructions to use the constant values with zero-cycle latency. | 03-31-2016 |
20160103680 | ARITHMETIC CIRCUIT AND CONTROL METHOD FOR ARITHMETIC CIRCUIT - An arithmetic circuit comprises first to N-th, N being an integer equal to or larger than three, element circuits respectively including: input circuits which input first operand data and second operand data; and element data selectors which select operand data of any of element circuits on the basis of a request element signal; and a data bus which supplies the operand data from the input circuits to the element data selectors. When a control signal is in a first state, the element data selectors select, on the basis of the request element signal included in the second operand data, the first operand data of any of the element circuits and output the first operand data. | 04-14-2016 |
20160110193 | INFORMATION PROCESSING APPARATUS AND CONTROL METHOD OF INFORMATION PROCESSING APPARATUS - The information processing apparatus includes an arithmetic processing device configured to output an access request, a storage device configured to store data, a storage control device configured to accept the access request to the storage device from the arithmetic processing device, transfer the accepted access request to the storage device, and acquire a response to the access request from the storage device, and a diagnosis control device configured to send an access request to the storage device to the storage control device in place of the access request to the storage device from the arithmetic processing device, and acquire a response from the storage device via the storage control device. | 04-21-2016 |
20160124713 | FAST, ENERGY-EFFICIENT EXPONENTIAL COMPUTATIONS IN SIMD ARCHITECTURES - In one embodiment, a computer-implemented method includes receiving as input a value of a variable x and receiving as input a degree n of a polynomial function being used to evaluate an exponential function e | 05-05-2016 |
20160147530 | STRUCTURE FOR MICROPROCESSOR ARITHMETIC LOGIC UNITS - Examples of techniques for designing processors are described herein. In one example, a design structure can be tangibly embodied in a machine readable medium for designing, manufacturing, or testing an integrated circuit. The design structure can include a logic to determine whether a received instruction is an updating fixed point instruction or a non-updating fixed point instruction. The design structure can include a first arithmetic logic unit (ALU) to execute the received instruction if the received instruction is determined to be an updating fixed point instruction and store an update value in a general register. The design structure can include a second arithmetic logic unit (ALU) to execute the received instruction if the received instruction is determined to be a non-updating fixed point instruction. | 05-26-2016 |
20160147531 | DESIGN STRUCTURE FOR MICROPROCESSOR ARITHMETIC LOGIC UNITS - A method in a computer-aided design system for generating a functional design model of a processor, is described herein. The method comprises generating a functional representation of logic to determine whether an instruction is an updating instruction or a non-updating instruction. The method further comprises generating a functional representation of a first arithmetic logic unit (ALU) coupled to a general register in the processor, the first ALU to execute the instruction if the instruction is an updating instruction and store an update value in the general register, and generating a functional representation of a second ALU in the processor to execute the instruction if the instruction is a non-updating instruction. | 05-26-2016 |
20160154646 | DATA PROCESSING DEVICE | 06-02-2016 |
20160162291 | PARALLEL ARITHMETIC DEVICE, DATA PROCESSING SYSTEM WITH PARALLEL ARITHMETIC DEVICE, AND DATA PROCESSING PROGRAM - A parallel arithmetic device including a plurality of data wirings disposed in a first direction and a second direction; a plurality of flag wirings corresponding to the data wirings; a plurality of wiring coupling switches disposed each being disposed at respective intersections of the data wirings; and a plurality of processor elements surrounded by the data wirings. A processor element from among the plurality of the processor elements is configured to: perform an arithmetic process on data supplied from a first processor element based on a first flag supplied from the first processor element, the data being supplied on data wiring and the first flag being supplied on flag wiring; output a computation result to a second processor element on data wiring; and output a second flag based on the computation result to the second processor on flag wiring. | 06-09-2016 |
20160170765 | Computer Processor Providing Exception Handling with Reduced State Storage | 06-16-2016 |
20160173126 | TECHNIQUES TO ACCELERATE LOSSLESS COMPRESSION | 06-16-2016 |
20160179514 | INSTRUCTION AND LOGIC FOR SHIFT-SUM MULTIPLIER | 06-23-2016 |
20160179515 | APPARATUS AND METHOD FOR PERFORMING A CHECK TO OPTIMIZE INSTRUCTION FLOW | 06-23-2016 |
20160179516 | INFORMATION PROCESSING DEVICE AND CONTROL METHOD | 06-23-2016 |
20160179523 | APPARATUS AND METHOD FOR VECTOR BROADCAST AND XORAND LOGICAL INSTRUCTION | 06-23-2016 |
20160188327 | APPARATUS AND METHOD FOR FUSED MULTIPLY-MULTIPLY INSTRUCTIONS - In one embodiment of the invention, a processor device including a storage location configured to store a set of source packed-data operands, each of the operands having a plurality of packed-data elements that are positive or negative according to an immediate bit value within one of the operands. The processor also including: a decoder to decode an instruction requiring an input of a plurality of source operands, and an execution unit to receive the decoded instructions and to generate a result that is a product of the source operands. In one embodiment, the result is stored back into one of the source operands or the result is stored into an operand that is independent of the source operands. | 06-30-2016 |
20190146788 | MEMORY DEVICE PERFORMING PARALLEL ARITHMETIC PROCESSING AND MEMORY MODULE INCLUDING THE SAME | 05-16-2019 |
20220137962 | LOGARITHMIC NUMBER SYSTEM - A processor comprising a register file comprising a bias register for holding a bias and a plurality of operand registers each for holding a respective number which together with the bias represents a respective value in a logarithmic number system; and an execution unit configured to, in response to receiving a logarithmic addition opcode: retrieve first and second numbers from first and second sources respectively; subtract the first number from the second number to determine a difference; and if the determined difference is less than or equal to a predetermined number, retrieve, from a look-up table, a third number mapped to the determined difference, and add the third number to the first number to determine a result; if the determined difference is greater than the predetermined number, determine the result to be the greatest of the first and second numbers; and store the result. | 05-05-2022 |