Entries |
Document | Title | Date |
20080307194 | Parallel, Low-Latency Method for High-Performance Deterministic Element Extraction From Distributed Arrays - The present invention provides a system and method for extracting elements from distributed arrays on a parallel processing system. The system includes a module that populates a local array with elements from input, a module that submits a largest element value in the local array and a processor ID for a local processor, and a module that determines a globally largest element value from the largest element values submitted by each one of the plurality of processors. The system further includes a module that broadcasts a winning globally largest element value and winning processor ID to the plurality of processors, and a module that increments an element pointer to the next value in the local array if the winning processor ID equals the processor ID for the local processor. | 12-11-2008 |
20090063811 | System for Data Processing Using a Multi-Tiered Full-Graph Interconnect Architecture - A system is provided for implementing a multi-tiered full-graph interconnect architecture. In order to implement a multi-tiered full-graph interconnect architecture, a plurality of processors are coupled to one another to create a plurality of processor books. The plurality of processor books are coupled together to create a plurality of supernodes. Then, the plurality of supernodes are coupled together to create the multi-tiered full-graph interconnect architecture. Data is then transmitted from one processor to another within the multi-tiered full-graph interconnect architecture based on an addressing scheme that specifies at least a supernode and a processor book associated with a target processor to which the data is to be transmitted. | 03-05-2009 |
20090070549 | Interconnect architecture in three dimensional network on a chip - The connection architecture of a network on a chip (NoC) is described in which (a) nodes in octahedron sections are connected in an arc Benes network, (b) a hierarchy of node clusters are connected using a globally asynchronous locally asynchronous (GALA) configuration, (c) a double wishbone 2D torus ring is applied to connection between network layers and (d) data is routed using buffer modulation. | 03-12-2009 |
20090119479 | INTEGRATED CIRCUIT ARRANGEMENT FOR CARRYING OUT BLOCK AND LINE BASED PROCESSING OF IMAGE DATA - An integrated circuit arrangement has a processor array ( | 05-07-2009 |
20090259824 | RECONFIGURABLE INTEGRATED CIRCUIT - A reconfigurable integrated circuit is provided wherein the available hardware resources can be optimised for a particular application. Dynamically reconfiguring (in both real-time and non real-time) the available resources and sharing a plurality of processing elements with a plurality of controller elements achieve this. In a preferred embodiment the integrated circuit includes a plurality of processing blocks, which interface to a reconfigurable interconnection means. A processing block has two forms, namely a shared resource block and a dedicated resource block. Each processing block consists of one or a plurality of controller elements and a plurality of processing elements. The controller element and processing element generally comprise diverse rigid coarse and fine grained circuits and are interconnected through dedicated and reconfigurable interconnect. The processing blocks can be configured as a hierarchy of blocks and or fractal architecture. | 10-15-2009 |
20100115235 | Eliminating Synchronous Grace Period Detection For Non-Preemptible Read-Copy Update On Uniprocessor Systems - A technique for optimizing grace period detection in a uniprocessor environment. An update operation is performed on a data element that is shared with non-preemptible readers of the data element. A call is issued to a synchronous grace period detection method. The synchronous grace period detection method performs synchronous grace period detection and returns from the call if the data processing system implements a multi-processor environment at the time of the call. The synchronous grace period detection determines the end of a grace period in which the readers have passed through a quiescent state and cannot be maintaining references to the pre-update view of the shared data. The synchronous grace period detection method returns from the call without performing grace period detection if the data processing system implements a uniprocessor environment at the time of the call. | 05-06-2010 |
20100318765 | ACTIVE MEMORY COMMAND ENGINE AND METHOD - A command engine for an active memory receives high level tasks from a host and generates corresponding sets of either DCU commands to a DRAM control unit or ACU commands to a processing array control unit. The DCU commands include memory addresses, which are also generated by the command engine, and the ACU command include instruction memory addresses corresponding to an address in an array control unit where processing array instructions are stored. | 12-16-2010 |
20100325386 | PARALLEL OPERATION DEVICE ALLOWING EFFICIENT PARALLEL OPERATIONAL PROCESSING - In arithmetic/logic units (ALU) provided corresponding to entries, an MIMD instruction decoder generating a group of control signals in accordance with a Multiple Instruction-Multiple Data (MIMD) instruction and an MIMD register storing data designating the MIMD instruction are provided, and an inter-ALU communication circuit is provided. The amount and direction of movement of the inter-ALU communication circuit are set by data bits stored in a movement data register. It is possible to execute data movement and arithmetic/logic operation with the amount of movement and operation instruction set individually for each ALU unit. Therefore, in a Single Instruction-Multiple Data type processing device, Multiple Instruction-Multiple Data operation can be executed at high speed in a flexible manner. | 12-23-2010 |
20150363356 | Data Speculation for Array Processors - A method is disclosed of utilizing a plurality of Arithmetic Logic Units (ALUs) of an array processor. It is determined that a first quantity of the ALUs are scheduled to execute a function during a given processing cycle, with each ALU being scheduled to use a respective one of a plurality of selected input vectors as an input. It is also determined that a second quantity of the ALUs are not scheduled for use during the given processing cycle. A plurality of predicted future input vectors that differ from the plurality of selected input vectors are determined. The second quantity of ALUs are scheduled to execute the function during the given processing cycle using respective ones of the plurality of predicted future input vectors as inputs. After completion of the processing cycle, function outputs received from the first and second quantity of ALUs are cached. | 12-17-2015 |
20160170766 | PROGRAMMABLE LOAD REPLAY PRECLUDING MECHANISM | 06-16-2016 |