Entries |
Document | Title | Date |
20080209166 | Method of Renaming Registers in Register File and Microprocessor Thereof - A microprocessor for processing instructions comprises multiple clusters for receiving the instructions, each of the clusters having a plurality of functional units for executing the instructions, multiple register sub-files each having multiple registers for storing data for executing the instructions, wherein each of the clusters is associated with corresponding one of the register sub-files so that an instruction dispatched to a cluster is executed by accessing registers in a register sub-file associated with the cluster to which the instruction is dispatched, a register-renaming unit for renaming target registers in an instruction with registers in a register sub-file associated with a cluster to which the instruction is dispatched, and issue-queue units each of which is associated with a corresponding one of the clusters, wherein an issue-queue unit holds instruction renamed by the register-renaming unit until the renamed instruction is issued to be executed in a cluster associated with the issue-queue unit. | 08-28-2008 |
20080215852 | System and Device Architecture For Single-Chip Multi-Core Processor Having On-Board Display Aggregator and I/O Device Selector Control - System, device, device architecture, and method for operating a multi-core processor providing application level file isolation and providing display frame buffer aggregator or selector to provide a user with the experience of multiple simultaneous application execution within a single processor while actually providing separate concurrent but isolated processing sessions. | 09-04-2008 |
20080222389 | INTERPROCESSOR MESSAGE TRANSMISSION VIA COHERENCY-BASED INTERCONNECT - A method includes communicating a first message between processors of a multiprocessor system via a coherency interconnect, whereby the first message includes coherency information. The method further includes communicating a second message between processors of the multiprocessor system via the coherency interconnect, whereby the second message includes interprocessor message information. A system includes a coherency interconnect and a processor. The processor includes an interface configured to receive messages from the coherency interconnect, each message including one of coherency information or interprocessor message information. The processor further includes a coherency management module configured to process coherency information obtained from at least one of the messages and an interrupt controller configured to generate an interrupt based on interprocessor message information obtained from at least one of the messages. | 09-11-2008 |
20080244225 | Integrated Circuit and Method For Transaction Retraction - An integrated circuit having a plurality of processing modules (I, T) is provided. At least one first processing module (I) issues at least one transaction towards at least one second processing module (T). Said integrated circuit further comprises at least one first transaction retraction unit (TRU | 10-02-2008 |
20080256331 | ARITHMETIC DEVICE CAPABLE OF OBTAINING HIGH-ACCURACY CALCULATION RESULTS - A plurality of general-purpose registers each has a first bit width. A computing unit has a first and a second input end, at least the first input end having a second bit width wider than the first bit width, and performs an arithmetical operation on data supplied from the general-purpose registers to the first and second input ends. An overflow register having a bit width narrower than the first bit width holds data on figures overflowed as a result of calculation by the computing unit as overflow data and supplies the held overflow data as higher-order bits to at least one input end of the computing unit. | 10-16-2008 |
20080270751 | SYSTEM AND METHOD FOR PROCESSING DATA IN A PIPELINE OF COMPUTERS - A series of computers to process data including a first and a last computer. Each of the computers except the first is preceded by a prior computer and each except the last is followed by a subsequent computer. A logic reads new data via a first data path and a logic writes old data via a second data path. A logic process the new data to produce the old data and, except for the last computer, a storage element stores the old data. The logic to write operates after the logic to read and the logic to write operates before the logic to process. | 10-30-2008 |
20080288749 | READ-COPY UPDATE GRACE PERIOD DETECTION WITHOUT ATOMIC INSTRUCTIONS THAT GRACEFULLY HANDLES LARGE NUMBERS OF PROCESSORS - A method, system and computer program product for avoiding unnecessary grace period token processing while detecting a grace period without atomic instructions in a read-copy update subsystem or other processing environment that requires deferring removal of a shared data element until pre-existing references to the data element are removed. Detection of the grace period includes establishing a token to be circulated between processing entities sharing access to the data element. A grace period elapses whenever the token makes a round trip through the processing entities. A distributed indicator associated with each processing entity indicates whether there is a need to perform removal processing on any shared data element. The distributed indicator is processed at each processing entity before the latter engages in token processing. Token processing is performed only when warranted by the distributed indicator. In this way, unnecessary token processing can be avoided when the distributed indicator does not warrant such processing. | 11-20-2008 |
20080294872 | DEFRAGMENTING BLOCKS IN A CLUSTERED OR DISTRIBUTED COMPUTING SYSTEM - Embodiments of the invention provides techniques for defragmenting blocks of resources allocated to perform computing jobs on a distributed or clustered system so that more contiguous physical resources may be made available to users submitting new job requests. Typically, the defragmentation process is performed when a job is submitted that requires access to a computing block that is larger than any currently available block in the parallel computing system. | 11-27-2008 |
20080301405 | SYSTEM AND METHOD FOR AUTOMATICALLY SEGMENTING AND POPULATING A DISTRIBUTED COMPUTING PROBLEM - The initial partitioning of a distributed computing problem can be critical, and is often a source of tedium for the user. A method is provided that automatically segments the problem into fixed sized collections of original program cells (OPCs) based on the complexity of the problem specified, and the combination of computing agents of various caliber available for the overall job. The OPCs that are on the edge of a collection can communicate with OPCs on the edges of neighboring collections, and are indexed separately from OPCs that are within the ‘core’ or inner non-edge portion of a collection. Consequently, core OPCs can iterate independently of whether any communication occurs between collections and groups of collections (VPPs). All OPCs on an edge have common dependencies on remote information (i.e., their neighbors are all on the same edge of a neighboring collection). | 12-04-2008 |
20080307198 | SIGNAL-PROCESSING APPARATUS AND ELECTRONIC APPARATUS USING SAME - A signal-processing apparatus includes an instruction-parallel processor, a first data-parallel processor, a second data-parallel processor, and a motion detection unit, a de-blocking filtering unit and a variable-length coding/decoding unit which are dedicated hardware. With this structure, during signal processing of an image compression and decompression algorithm needing a large amount of processing, the load is distributed between software and hardware, so that the signal-processing apparatus can realize high processing capability and flexibility. | 12-11-2008 |
20080320275 | Concurrent exception handling - Various technologies and techniques are disclosed for providing concurrent exception handling. Exceptions that occur in concurrent workers are caught. The caught exceptions are then forwarded from the concurrent workers to a coordination worker. The caught exceptions are finally aggregated into an aggregation structure, such as an aggregate exception object. This aggregation structure is rethrown and the individual caught exceptions may then be handled at a proper time. | 12-25-2008 |
20080320276 | Digital Computing Device with Parallel Processing - A digital processing device comprising a plurality of parallel processing units each coupled in parallel with one another. Each of the plurality of parallel processing units comprises at least one data memory storage unit; at least one input register coupled to the at least one data memory storage unit; and an arithmetic unit coupled to the at least one input register and configured to have synchronous command processing. A program execution control unit is coupled to each of the plurality of processing units and configured such that no processing clocks are required for synchronization of data transfer from the plurality of parallel processing units. At least one data bus is coupled to the at least one input register in each of the plurality of parallel processing units. | 12-25-2008 |
20080320277 | Thread Optimized Multiprocessor Architecture - In one aspect, the invention comprises a system comprising: (a) a plurality of parallel processors on a single chip; and (b) computer memory located on the chip and accessible by each of the processors; wherein each of the processors is operable to process a de minimis instruction set, and wherein each of the processors comprises local caches dedicated to each of at least three specific registers in the processor. In another aspect, the invention comprises a system comprising: (a) a plurality of parallel processors on a single chip; and (b) computer memory located on the chip and accessible by each of the processors, wherein each of the processors is operable to process an instruction set optimized for thread-level parallel processing and wherein each processor accesses the internal data bus of the computer memory on the chip and the internal data bus is the width of one row of the memory. | 12-25-2008 |
20090055625 | Parallel Processing Systems And Method - Methods and systems for parallel computation of an algorithm using a plurality of nodes configured as a Howard Cascade. A home node of a Howard Cascade receives a request from a host system to compute an algorithm identified in the request. The request is distributed to processing nodes of the Howard Cascade in a time sequence order in a manner to minimize the time to so expand the Howard Cascade. The participating nodes then perform the designated portion of the algorithm in parallel. Partial results from each node are agglomerated upstream to higher nodes of the structure and then returned to the host system. The nodes each include a library of stored algorithms accompanied by data template information defining partitioning of the data used in the algorithm among the number of participating nodes. | 02-26-2009 |
20090063812 | PROCESSOR, DATA TRANSFER UNIT, MULTICORE PROCESSOR SYSTEM - A processor includes a CPU capable of performing predetermined arithmetic processing, a memory accessible by the CPU, and a data transfer unit capable of controlling data transfer with the memory by substituting for the CPU. The data transfer unit is provided with a command chain unit for continuously performing data transfer by execution of a preset command chain, and a retry controller for executing a retry processing in case a transfer error occurs during data transfer by the command chain unit. Then, the data transfer unit reports a command relating to the transfer error to the CPU after completion of the execution of the command chain, thereby lessening the number of interruptions for error processing, and attaining enhancement in performance of a system. | 03-05-2009 |
20090083515 | Soft-reconfigurable massively parallel architecture and programming system - In an embodiment, the present invention discloses a flexible and reconfigurable architecture with efficient memory data management, together with efficient data transfer and relieving data transfer congestion in an integrated circuit. In an embodiment, the output of a first functional component is stored to an input memory of a next functional component. Thus when the first functional component completes its processing, its output is ready to be accessed as input to the next functional component. In an embodiment, the memory device further comprises a partition mechanism for simultaneously accepting output writing from the first functional component and accepting input reading from the second functional component. In another embodiment, the present integrated circuit comprises at least two functional components and at least two memory devices, together with a controller for switching the connections between the functional components and the memory devices. The controller can comprise a multiplexer or a switching matrix. | 03-26-2009 |
20090083516 | MULTIMEDIA PROCESSING IN PARALLEL MULTI-CORE COMPUTATION ARCHITECTURES - In a media server for processing data packets, media server functions are implemented by a plurality of modules categorized by real-time response requirements. | 03-26-2009 |
20090094437 | Method And Device For Controlling Multicore Processor - The present invention provides a method and a device for controlling a multicore processor by selecting and operating the appropriate number of cores corresponding to an operation state of the processor. In a multicore processor having a plurality of cores each independently performing a calculation process on one processor, an operating rate of a thread or task of each core within a predetermined time is calculated by summing the operating times or the number of operating times within a predetermined time, and an overall operating rate of all the cores is found by summing the calculated operating rates. The number of operating cores corresponding to the overall operating rate is determined by a previously set table. The number of cores operating has a hysteresis characteristic in which the number of operating cores is different between increasing and decreasing times of the overall operating rate. Operating cores corresponding to the number of the determined cores are selected by the previously set table. When an exceptional process is detected, all the cores operate. After a predetermined time, when it is determined that the exceptional process is eliminated, the process returns to the original processing. | 04-09-2009 |
20090106529 | FLATTENED BUTTERFLY PROCESSOR INTERCONNECT NETWORK - A multiprocessor computer system comprises a folded butterfly processor interconnect network, the folded butterfly interconnect network comprising a traditional butterfly interconnect network derived from a butterfly network by flattening routers in each row into a single router for each row, and eliminating channels entirely local to the single row. | 04-23-2009 |
20090113171 | TPM DEVICE FOR MULTI-PROCESSOR SYSTEMS - In one embodiment, a computer system comprises at least a first computing cell and a second computing cell, each computing cell comprising at least one processor, at least one programmable trusted platform management device coupled to the processor via a hardware path which goes through at least one trusted platform management device controller which manages operations of the at least one programmable trusted platform device, and a routing device to couple the first and second computing cells. | 04-30-2009 |
20090113172 | NETWORK TOPOLOGY FOR A SCALABLE MULTIPROCESSOR SYSTEM - A system and method for interconnecting a plurality of processing element nodes within a scalable multiprocessor system is provided. Each processing element node includes at least one processor and memory. A scalable interconnect network includes physical communication links interconnecting the processing element nodes in a cluster. A first set of routers in the scalable interconnect network route messages between the plurality of processing element nodes. One or more metarouters in the scalable interconnect network route messages between the first set of routers so that each one of the routers in a first cluster is connected to all other clusters through one or more metarouters. | 04-30-2009 |
20090150650 | Kernel Processor Grouping - Techniques for grouping individual processors into assignment entities are discussed. Statically grouping processors may permit threads to be assigned on a group basis. In this manner, the burden of scheduling threads for processing may be minimized, while the processor within the assignment entity may be selected based on the physical locality of the individual processors within the group. The groupings may permit a system to scale to meet the processing demands of various applications. | 06-11-2009 |
20090150651 | Semiconductor chip - Disclosed herein is a semiconductor chip including: a plurality of processing devices that can communicate with each other; wherein each of the processing devices includes an arithmetic unit, an individual memory connected to the arithmetic unit on a one-to-one basis, and a control unit configured to independently control turning on and off of operation of the arithmetic unit and the individual memory. | 06-11-2009 |
20090204789 | DISTRIBUTING PARALLEL ALGORITHMS OF A PARALLEL APPLICATION AMONG COMPUTE NODES OF AN OPERATIONAL GROUP IN A PARALLEL COMPUTER - Methods, apparatus, and products for distributing parallel algorithms of a parallel application among compute nodes of an operational group in a parallel computer are disclosed that include establishing a hardware profile, the hardware profile describing thermal characteristics of each compute node in the operational group; establishing a hardware independent application profile, the application profile describing thermal characteristics of each parallel algorithm of the parallel application; and mapping, in dependence upon the hardware profile and application profile, each parallel algorithm of the parallel application to a compute node in the operational group. | 08-13-2009 |
20090216996 | Parallel Processing - A system and methods comprising a plurality of leaf nodes in communication with one or more branch nodes, each node comprising a processor. Each leaf node is arranged to obtain data indicative of a restriction A| | 08-27-2009 |
20090240915 | Broadcasting Collective Operation Contributions Throughout A Parallel Computer - Methods, systems, and products are disclosed for broadcasting collective operation contributions throughout a parallel computer. The parallel computer includes a plurality of compute nodes connected together through a data communications network. Each compute node has a plurality of processors for use in collective parallel operations on the parallel computer. Broadcasting collective operation contributions throughout a parallel computer according to embodiments of the present invention includes: transmitting, by each processor on each compute node, that processor's collective operation contribution to the other processors on that compute node using intra-node communications; and transmitting on a designated network link, by each processor on each compute node according to a serial processor transmission sequence, that processor's collective operation contribution to the other processors on the other compute nodes using inter-node communications. | 09-24-2009 |
20090249029 | METHOD FOR AD-HOC PARALLEL PROCESSING IN A DISTRIBUTED ENVIRONMENT - An overall processing time to rasterize, at the first device, the electronic document to be rendered is computed. Also, a rendering time to render, at the first device, the electronic document to be rendered is computed. When the overall processing time to rasterize at the first device is greater than the rendering time to render at the first device, the electronic document to be rendered is parsed into a first document and sub-documents. A productivity capacity of each node is determined, the productivity capacity being a measured of the processing power of the node and the communication cost of exchanging information between the first device and the node. A sub-document is rasterized at a node when a productivity capacity of the node reduces the processing time to rasterize the electronic document to be rendered to be less than the computed overall processing time. The rasterized first document and each rasterized sub-document are aggregated to create a rasterized electronic document to be rendered at the first device. | 10-01-2009 |
20090259825 | MULTI-CORE PROCESSING SYSTEM - A system has a first plurality of cores in a first coherency group. Each core transfers data in packets. The cores are directly coupled serially to form a serial path. The data packets are transferred along the serial path. The serial path is coupled at one end to a packet switch. The packet switch is coupled to a memory. The first plurality of cores and the packet switch are on an integrated circuit. The memory may or may not be on the integrated circuit. In another aspect a second plurality of cores in a second coherency group is coupled to the packet switch. The cores of the first and second pluralities may be reconfigured to form or become part of coherency groups different from the first and second coherency groups. | 10-15-2009 |
20090292900 | MULTIPROCESSOR NODE CONTROL TREE - Control messages are sent from a control processor to a plurality of attached processors via a control tree structure comprising the plurality of attached processors and branching from the control processor, such that two or more of the plurality of attached processor nodes are operable to send messages to other attached processor nodes in parallel. | 11-26-2009 |
20090300326 | SYSTEM, METHOD AND COMPUTER PROGRAM FOR TRANSFORMING AN EXISTING COMPLEX DATA STRUCTURE TO ANOTHER COMPLEX DATA STRUCTURE - A method (system and computer program product) performs facet classification synthesis to relate concepts represented by concept definitions defined in accordance with a faceted data set comprising facets, facet attributes, and facet attributes hierarchies. Dimensional concept relationships are expressed between the concept definitions. Two concept definitions are determined to be related in a particular dimensional concept relationship by examining whether at least one of explicit relationships and implicit relationships exist in the faceted data set between the respective facet attributes of the two concept definitions. | 12-03-2009 |
20100011189 | INFORMATION PROCESSING DEVICE AND METHOD FOR DESIGNING AN INFORMATION PROCESSING DEVICE - An information processing device includes a plurality of processor cores each including a plurality of transistors, and at least one substrate bias circuit that supplies each of the plurality of transistors with a substrate bias voltage that is determined based on the number of the processor cores. | 01-14-2010 |
20100037035 | Generating An Executable Version Of An Application Using A Distributed Compiler Operating On A Plurality Of Compute Nodes - Methods, apparatus, and products are disclosed for generating an executable version of an application using a distributed compiler operating on a plurality of compute nodes that include: receiving, by each compute node, a portion of source code for an application; compiling, in parallel by each compute node, the portion of the source code received by that compute node into a portion of object code for the application; performing, in parallel by each compute node, inter-procedural analysis on the portion of the object code of the application for that compute node, including sharing results of the inter-procedural analysis among the compute nodes; optimizing, in parallel by each compute node, the portion of the object code of the application for that compute node using the shared results of the inter-procedural analysis; and generating the executable version of the application in dependence upon the optimized portions of the object code of the application. | 02-11-2010 |
20100049941 | System And Method For Parallel Processing Using A Type I Howard Cascade - A method using for performing a scatter-type data distribution among a cluster of computational devices. A number of nodes (equal to a value Cg, the number of tree generator channels) are initially generated, each connected to an initial generator, to create respective initial root nodes of an initial tree structure. Data is transmitted from the initial generator to each of the initial root nodes. Cg root nodes, each connected to a respective new generator, are generated to create respective roots of Cg newly generated tree structures. Each of the tree structures is expanded by generating Ct (the number of communication channels per node in each tree structure) new nodes connected to each node generated in each previous step. Data is then transmitted to each of the new nodes from an immediately preceding one of the nodes, and from each new generator to an associated root node. | 02-25-2010 |
20100077177 | Multiple Processor Core Vector Morph Coupling Mechanism - One embodiment of the invention provides a processor. The processor generally includes a first and second processor core, each having a plurality of pipelined execution units for executing an issue group of multiple instructions and scheduling logic configured to issue a first issue group of instructions to the first processor core for execution and a second issue group of instructions to the second processor core for execution when the processor is in a first mode of operation and configured to issue one or more vector instructions for concurrent execution on the first and second processor cores when the processor is in a second mode of operation. | 03-25-2010 |
20100082940 | INFORMATION PROCESSOR - An information processor controls accesses to a cache memory from application software programs differing in range of addresses, accesses to which are authorized. The cache memory blocks an access to an unauthorized address. In the information processor, an ID is assigned to each application software program, and the tag field of the cache memory is extended. Further, in performing “Cache Fill” (i.e. reading main memory data into the cache memory), the ID is recorded. At the time of making a cache hit judgment, the access control is performed by comparing the extended tag field with ID of an application software program group of an access requester. | 04-01-2010 |
20100106941 | MULTI-CORE STREAM PROCESSING SYSTEM, AND SCHEDULING METHOD FOR THE SAME - In a multi-core stream processing system and scheduling method of the same, a scheduler is coupled to a number (N) of stream processing units and a number (N+1) of stream fetching units, where N≧2. When the scheduler receives a stream element from a P | 04-29-2010 |
20100174886 | Multi-Core Processing Utilizing Prioritized Interrupts for Optimization - This invention relates to multi-core, multi-processing, factory multi-core and DSP multi-core. The nature of the invention is related to more optimal uses of a multi-core system to maximize utilization of the processor cores and minimize power use. The novel and inventive steps are focused on use of interrupts and prioritized interrupts, along with optional in-built methods, to allow systems to run more efficiently and with less effort on the part of the programmer. | 07-08-2010 |
20100228949 | PROCESSORS - A processing apparatus comprises a plurality of processors ( | 09-09-2010 |
20100241825 | Opportunistic Transmission Of Software State Information Within A Link Based Computing System - A method is described that involves determining that software state information of program code is to be made visible to a monitoring system. The method also involves initiating the writing of the software state information into a register. The method also involves waiting for the software state information to be placed onto a link within a link based computing system. | 09-23-2010 |
20110047350 | PARTITION LEVEL POWER MANAGEMENT USING FULLY ASYNCHRONOUS CORES WITH SOFTWARE THAT HAS LIMITED ASYNCHRONOUS SUPPORT - A partition that is executed by multiple processing nodes. Each node includes multiple cores and each of the cores has a frequency that can be set. A first frequency range is provided to the cores. Each core, when executing the identified partition, sets its frequency within the first frequency range. Frequency metrics are gathered from the cores running the partition by the nodes. The gathered frequency metrics are received and analyzed by a hypervisor that determines a second frequency range to use for the partition, with the second frequency range being different from the first frequency range. The second frequency range is provided to the cores at the nodes executing the identified partition. When the cores execute the identified partition, they use a frequencies within the second frequency range. | 02-24-2011 |
20110047351 | ROUTING IMAGE DATA ACROSS ON-CHIP NETWORKS - A network of switches may be adapted to route image data to one or more processor cores based on tags associated with data samples, where each tag includes at least one reference-space coordinate value. When image data is received by the network, the image data may be spatially transformed to a reference space, e.g., the physical space that is represented by the image data, to generate the data samples and each data sample may be tagged with a corresponding reference space coordinate value and routed through the network to one or more of the processors according to the tag. | 02-24-2011 |
20110107061 | Performance of first and second macros while data is moving through hardware pipeline - A hardware pipeline has a number of rows including a first row, a last row, and an intermediate row between the first row and the last row. Each row stores a number of bytes of data as the data moves through the pipeline on a row-by-row basis from the first row towards the last row. A mechanism performs a first macro on the data beginning at the first row. The mechanism performs a second macro different than the first macro on the data beginning at the intermediate row where the first macro has been completely performed when the data has reached the intermediate row. The first and second macros each include a number of modifications of the data as the data moves through the pipeline to effect a complete transformation of the data. The complete transformation of the first macro is different than the complete transformation of the second data. | 05-05-2011 |
20110208947 | System and Method for Simplifying Transmission in Parallel Computing System - Simplifying transmission in a distributed parallel computing system. The method includes: identifying at least one item in a data input to the parallel computing unit; creating a correspondence relation between the at least one item and indices thereof according to a simplification coding algorithm, where the average size of the indices is less than the average size of the at least one item; replacing the at least one item with the corresponding indices according to the correspondence relation; generating simplified intermediate results by the parallel computing unit based on the indices; and transmitting the simplified intermediate results. The invention also provides a system corresponding to the above method. | 08-25-2011 |
20110225392 | Methods and Apparatus for Providing Bit-Reversal and Multicast Functions Utilizing DMA Controller - Techniques for providing improved data distribution to and collection from multiple memories are described. Such memories are often associated with and local to processing elements (PEs) within an array processor. Improved data transfer control within a data processing system provides support for radix 2, 4 and 8 fast Fourier transform (FFT) algorithms through data reordering or bit-reversed addressing across multiple PEs, carried out concurrently with FFT computation on a digital signal processor (DSP) array by a DMA unit. Parallel data distribution and collection through forms of multicast and packet-gather operations are also supported. | 09-15-2011 |
20110271077 | PROCESSOR AND DATA COLLECTION METHOD - A processor has a plurality of PEs (processing elements) that operate in parallel based on operation commands and an information collection unit that collects the data of the plurality of PEs, wherein each of the plurality of PEs holds data and a condition flag, supplies the data and the condition flag to the information collection unit upon receiving an operation command, and upon receiving an update request for updating the condition flag, updates the condition flag in accordance with the update request that was received; and the information collection unit, upon receiving the data and the condition flags, selects one PE based on a predetermined order of priority from among the PEs for which the received condition flags are active and both supplies the data of the selected PE as collection result data and supplies an update request for updating the condition flag of the PE that was selected. | 11-03-2011 |
20110320766 | APPARATUS, METHOD, AND SYSTEM FOR IMPROVING POWER, PERFORMANCE EFFICIENCY BY COUPLING A FIRST CORE TYPE WITH A SECOND CORE TYPE - An apparatus and method is described herein for coupling a processor core of a first type with a co-designed core of a second type. Execution of program code on the first core is monitored and hot sections of the program code are identified. Those hot sections are optimize for execution on the co-designed core, such that upon subsequently encountering those hot sections, the optimized hot sections are executed on the co-designed core. When the co-designed core is executing optimized hot code, the first processor core may be in a low-power state to save power or executing other code in parallel. Furthermore, multiple threads of cold code may be pipelined on the first core, while multiple threads of hot code are pipeline on the co-designed core to achieve maximum performance. | 12-29-2011 |
20120079234 | PERFORMING COMPUTATIONS IN A DISTRIBUTED INFRASTRUCTURE - The present invention extends to methods, systems, and computer program products for performing computations in a distributed infrastructure. Embodiments of the invention include a general purpose distributed computation infrastructure that can be used to perform efficient (in-memory), scalable, failure-resilient, atomic, flow-controlled, long-running state-less and state-full distributed computations. Guarantees provided by a distributed computation infrastructure can build upon existent guarantees of an underlying distributed fabric in order to hide the complexities of fault-tolerance, enable large scale highly available processing, allow for efficient resource utilization, and facilitate generic development of stateful and stateless computations. A distributed computation infrastructure can also provide a substrate on which existent distributed computation models can be enhanced to become failure-resilient. | 03-29-2012 |
20120216015 | System and method to concurrently execute a plurality of object oriented platform independent programs by utilizing memory accessible by both a processor and a co-processor - The invention achieves efficient execution of programs belonging to an object oriented platform independent language technology like Java, .NET in a multitasking environment by utilizing a processor, a co-processor (executing machine independent instructions) and memory that is accessed by both said processor and said co-processor. The co-processor is agnostic of format of the executables of the object oriented platform independent programs and operates on a composite data structure to execute a program. The composite data structure is a logical representation of an objected oriented platform independent computer program and includes instructions, object pointers, metadata, etc. Said composite data structure is independent of any object oriented platform independent technology like Java, .NET, etc. The co-processor relies on a native program to reduce executable file(s) of an objected oriented platform independent program to the said composite data structure. The invention allows the co-processor to perform scheduling, context switching and aids garbage collection apart from executing the programs of languages like Java, .NET efficiently. The invention aims at providing a co-processor as an alternative to using complex software like Just In Time (JIT) compilers to achieve high performance execution of object oriented platform independent language programs. | 08-23-2012 |
20130275717 | Multi-Tier Data Processing - Various embodiments of the present invention provide systems and methods for a multi-tier data processing system. For example, a data processing system is disclosed that includes an input operable to receive data to be processed, a first data processor operable to process at least some of the data, a second data processor operable to process a portion of the data not processed by the first data processor, wherein the first data processor has a higher throughput than the second data processor, and an output operable to yield processed data from the first data processor and the second data processor. | 10-17-2013 |
20130311751 | SYSTEM AND DATA LOADING METHOD - A system includes plural processors; memory that stores a program currently under execution by the processors; and a pre-loader that pre-loads into a fragment area of the memory, a target program that is to be executed and is a program other than the program currently under execution by the processors. | 11-21-2013 |
20130332702 | CONTROL FLOW IN A HETEROGENEOUS COMPUTER SYSTEM - Methods, apparatuses, and computer readable media are disclosed for control flow on a heterogeneous computer system. The method may include a first processor of a first type, for example a CPU, requesting a first kernel be executed on a second processor of a second type, for example a GPU, to process first work items. The method may include the GPU executing the first kernel to process the first work items. The first kernel may generate second work items. The GPU may execute a second kernel to process the generated second work items. The GPU may dispatch producer kernels when space is available in a work buffer. The GPU may dispatch consumer kernels to process work items in the work buffer when the work buffer has available work items. The GPU may be configured to determine a number of processing elements to execute the first kernel and the second kernel. | 12-12-2013 |
20130339662 | VERIFICATION OF DISTRIBUTED SYMMETRIC MULTI-PROCESSING SYSTEMS - A method, apparatus and product useful for verifying Distributed Symmetric Multi-Processing systems (DSMPs). The method comprising: determining one or more sub-systems of a DSMP, wherein each sub-system is a Symmetric Multi-Processing System (SMP) which comprises a shared memory and a set of processing entities that have the same access permissions to the shared memory; and verifying the DSMP using a verification tool designed to verify an SMP, wherein said verifying is performed by verifying each sub-system. | 12-19-2013 |
20140013079 | TRANSACTION PROCESSING USING MULTIPLE PROTOCOL ENGINES - A multi-processor computer system is described in which transaction processing is distributed among multiple protocol engines. The system includes a plurality of local nodes and an interconnection controller interconnected by a local point-to-point architecture. The interconnection controller comprises a plurality of protocol engines for processing transactions. Transactions are distributed among the protocol engines using destination information associated with the transactions. | 01-09-2014 |
20140195779 | SOFTWARE BASED APPLICATION SPECIFIC INTEGRATED CIRCUIT - A processing device is provided. A cluster includes a plurality of groups of processing elements. A multi-word device is connected to the processing elements within the groups. Each processing element in a particular group is in communication with all other processing elements within the particular group, and only one of the processing elements within other groups in the cluster. Each processing element is limited to operations in which input bits can be processed and an output obtained without reference to other bits. The multi-word device is configured to cooperate with at least two other processing elements to perform processing that requires reference to other bits to obtain a result. | 07-10-2014 |
20140281376 | Creating An Isolated Execution Environment In A Co-Designed Processor - In an embodiment, a processor includes a binary translation (BT) container having code to generate a binary translation of a first code segment and to store the binary translation in a translation cache, a host entity logic to manage the BT container and to identify the first code segment, and protection logic to isolate the BT container from a software stack. In this way, the BT container is configured to be transparent to the software stack. Other embodiments are described and claimed. | 09-18-2014 |
20150046676 | Method and Devices for Data Path and Compute Hardware Optimization - Methods and devices for distributing processing capacity in a multi-processor system include monitoring a data input for a feature activity with a first processor, such as a high efficiency processor. When feature activity is detected, a feature event may be predicted and processing capacity requirement may be estimated. The sufficiency of available processing capacity of the first processor to meet the estimated future processing capacity requirement and process the predicted feature event may be determined. Processing capacity of a second processor, such as a high performance processor, may be distributed along with the data input when the available processing capacity of the first processor are insufficient to meet the processing capacity requirement and process the predicted feature event. | 02-12-2015 |
20160071021 | SYSTEMS AND METHODS FOR IMPROVING THE PERFORMANCE OF A QUANTUM PROCESSOR VIA REDUCED READOUTS - Techniques for improving the performance of a quantum processor are described. The techniques include reading out a fraction of the qubits in a quantum processor and utilizing one or more post-processing operations to reconstruct qubits of the quantum processor that are not read. The reconstructed qubits may be determined using a perfect sampler to provide results that are strictly better than reading all of the qubits directly from the quantum processor. The composite sample that includes read qubits and reconstructed qubits may be obtained faster than if all qubits of the quantum processor are read directly. | 03-10-2016 |