Entries |
Document | Title | Date |
20080250227 | General Purpose Multiprocessor Programming Apparatus And Method - The present invention provides methods and apparatus for highly efficient parallel operations using a reduction unit. In a particular aspect, there is provided an apparatus and method for parallel computing. In each of the apparatus and method, there are performed independent operations by a plurality of processing units to obtain a sequence of results from each of the processing units, the step of performing independent operations including accessing data from a common memory by each of the plurality of processing units. There are also operations performed upon each of the results obtained from each of the processing units using a reduction unit to obtain a globally coherent and strictly consistent state signal, the globally coherent and strictly consistent state signal being fed back to each of the plurality of processing units in order to synchronize operations therebetween. | 10-09-2008 |
20080263321 | Universal Register Rename Mechanism for Targets of Different Instruction Types in a Microprocessor - A unified register rename mechanism for targets of different instruction types is provided in a microprocessor. The universal rename mechanism renames destinations of different instruction types using a single rename structure. Thus, an instruction that is updating a floating point register (FPR) can be renamed along with an instruction that is updating a general purpose register (GPR) or vector multimedia extensions (VMX) instructions register (VR) using the same rename structure because the number of architected states for GPR is the same as the number of architected states for FPR and VR. Each destination tag (DTAG) is assigned to one destination. A floating point instruction may be assigned to a DTAG, and then a fixed point instruction may be assigned to the next DTAG and so forth. With a universal rename mechanism, significant silicon and power can be saved by having only one rename structure for all instruction types. | 10-23-2008 |
20080313428 | Microprocessor - The invention relates to a microprocessor having a plurality of components which are selected from registers ( | 12-18-2008 |
20090031107 | ON-CHIP MEMORY PROVIDING FOR MICROCODE PATCH OVERLAY AND CONSTANT UPDATE FUNCTIONS - A patch mechanism in a microprocessor is provided. The patch mechanism includes an expansion RAM and a patch loader. The expansion RAM stores a plurality of patches, where a first one or more of the plurality of patches are to be executed by the microprocessor in place of a corresponding one or more micro instructions which are stored in a microcode ROM, and where a second one or more of the plurality of patches are employed to patch a corresponding one or more machine states in the microprocessor. The patch loader is coupled to the expansion RAM, and is configured to retrieve the plurality of patches from a source external to the microprocessor, and is configured to load the plurality of patches into the expansion RAM. | 01-29-2009 |
20090031108 | CONFIGURABLE FUSE MECHANISM FOR IMPLEMENTING MICROCODE PATCHES - A patch apparatus includes fuse banks, one or more configuration fuse banks, and an array controller. The fuse banks are configured to store associated patch records that are employed to patch microcode or machine state circuits in the microprocessor or to store associated control data entities that are employed to program control circuits in the microprocessor. The configuration fuse banks are encoded to indicate whether each of the plurality of fuse banks is programmed with one of the associated patch records or with one of the associated control data entities. The array controller reads the fuse banks, and provides the associated patch records to a patch loader or the associated control data entities to control circuits in the microprocessor. The patch loader provides patches corresponding to the associated patch records, as prescribed, to designated target patch mechanisms in the microprocessor. The patch loader provides the patches to the designated target patch mechanisms following transition of a microprocessor reset signal and prior to execution of instructions stored in a BIOS ROM. | 01-29-2009 |
20090106531 | FIELD PROGRAMMABLE GATE ARRAY AND MICROCONTROLLER SYSTEM-ON-A-CHIP - A system-on-a-chip integrated circuit has a field programmable gate array core having logic clusters, static random access memory modules, and routing resources, a field programmable gate array virtual component interface translator having inputs and outputs, wherein the inputs are connected to the field programmable gate array core, a microcontroller, a microcontroller virtual component interface translator having input and outputs, wherein the inputs are connected to the microcontroller, a system bus connected to the outputs of the field programmable gate array virtual component interface translator and also to the outputs of said microcontroller virtual component interface translator, and direct connections between the microcontroller and the routing resources of the field programmable gate array core. | 04-23-2009 |
20090113173 | COMPUTER SYSTEM AND METHOD THAT ELIMINATES THE NEED FOR AN OPERATING SYSTEM - A hardware/firmware layer comprising a Device Manager, an Information Manager, a Memory Manager, and a Process Manager contained in one or more semiconductor chips is disclosed. The hardware/firmware layer eliminates the need for an operating system. Each of the Managers comprises a microcontroller associated with a firmware embedded in ROM or Flash memory that contains instruction sets that cause the microcontroller to provide a designated task of device management, information management, memory management and process management. In another aspect of the invention, devices connected to the computer system are “smart devices,” each device having a device microcontroller and embedded device drivers in a ROM or Flash memory. The hardware/firmware of the present invention does not need to search for available devices, provide diagnostic tests or obtain device drivers to communicate with the devices. Instead, the device microcontroller uses the embedded device driver to perform configuration and self diagnostic test as well as device operations. If the device is operational, the device microcontroller sends an identification signal to the hardware/firmware layer of the present to indicate availability of the device. | 04-30-2009 |
20090282217 | Horizontal Scaling of Stream Processing - A computer implemented method, data processing system, and computer program product for dynamically scheduling algorithms in a pipeline which operate on a stream of data. The illustrative embodiments determine a computational cost of each algorithm in a plurality of algorithms in a pipeline. The plurality of algorithms in the pipeline processes an incoming data stream in a first sequential algorithm order. The illustrative embodiments reorder the plurality of algorithms in the pipeline to form a second sequential algorithm order based on the computational cost of each algorithm. The plurality of algorithms may then be executed in the second sequential algorithm order. When the illustrative embodiments assign a spare processing unit to an algorithm at an end of the pipeline, the computational cost of each algorithm in the plurality of algorithms in the pipeline is redetermined. | 11-12-2009 |
20100169608 | FLEXIBLE COUNTER UPDATE AND RETRIEVAL - A network device includes one or more processing units and an external memory. Each of the one or more processing units includes a centralized counter configured to perform accounting for the respective processing unit. The external memory is associated with at least one of the one or more processing units and is configured to store a group of count values for the at least one processing unit. | 07-01-2010 |
20100299496 | Thread Partitioning in a Multi-Core Environment - A set of helper thread binaries is created to retrieve data used by a set of main thread binaries. The set of helper thread binaries and the set of main thread binaries are partitioned according to common instruction boundaries. As a first partition in the set of main thread binaries executes within a first core, a second partition in the set of helper thread binaries executes within a second core, thus “warming up” the cache in the second core. When the first partition of the main completes execution, a second partition of the main core moves to the second core, and executes using the warmed up cache in the second core. | 11-25-2010 |
20110004741 | Spilling Method in Register Files for Microprocessor - A spilling method in register files for a processor is proposed. The processor is of Parallel Architecture Core (PAC) structure, and accordingly includes a first cluster, a second cluster and a memory. Each of the first and second clusters includes a first function unit (e.g., M-Unit), a second function unit (e.g., I-Unit), a first local register file, a second local register file and a global register file. The first and second local register files are used by the first and second function units, respectively. For a specified live range, the method includes the steps of calculating communication costs of the first local register file, the second local register file and the global register file in each of the first and second clusters, and communication cost of the memory for storing the live range when spilling occurs; calculating use ratios of the first local register file, the second local register file and the global register file in each of the first and second clusters, and use ratio of the memory for the live range; and selecting one of the first local register file, the second local register file and the global register file in each of the first and second clusters and the memory for storing the live range based on the communication costs and the use ratios. | 01-06-2011 |
20110022821 | System and Methods to Improve Efficiency of VLIW Processors - Exemplary embodiments provide microprocessors and methods to implement instruction packing techniques in a multiple-issue microprocessor. Exemplary instruction packing techniques implement instruction grouping vertically along packed groups of consecutive instructions, and horizontally along instruction slots of a multiple-issue microprocessor. In an exemplary embodiment, an instruction packing technique is implemented in a very long instruction word (VLIW) architecture designed to take advantage of instruction level parallelism (ILP). | 01-27-2011 |
20110125984 | MICROPROCESSOR - The invention relates to a microprocessor having a plurality of components which are selected from registers ( | 05-26-2011 |
20110125985 | Providing A Dedicated Communication Path Separate From A Second Path To Enable Communication Between Compliant Sequencers Using An Assertion Signal - In one embodiment, the present invention includes a method for communicating an assertion signal from a first instruction sequencer to a plurality of accelerators coupled to the first instruction sequencer, detecting the assertion signal in the accelerators and communicating a request for a lock, and registering an accelerator that achieves the lock by communication of a registration message for the accelerator to the first instruction sequencer. Other embodiments are described and claimed. | 05-26-2011 |
20110161628 | DATA PROCESSING APPARATUS AND METHOD OF CONTROLLING RECONFIGURABLE CIRCUIT LAYER - According to one embodiment, a data processing apparatus includes plural reconfigurable circuit layers, a first memory, a selecting unit, and a configuring unit. In each of the plural reconfigurable circuit layers, a processing circuit can be reconfigured. The first memory stores circuit information representing processing circuits that should be configured. The selecting unit selects, if it is unnecessary to use all the plural reconfigurable circuit layers in order to configure the processing circuits represented by the circuit information, a part of the reconfigurable circuit layers having high priority orders set in advance and otherwise selects all the plural reconfigurable circuit layers. The configuring unit configures, using the selected reconfigurable circuit layers, the processing circuits represented by the circuit information stored in the first memory. | 06-30-2011 |
20110202746 | PROCESSING ARCHITECTURE - The invention is directed towards a processing apparatus for a portable communication device. The apparatus includes: a central processing unit, first and second digital signal processing units, a first dual port memory unit adapted to store data shared between the central processing unit and the first digital signal processing unit, and a second dual port memory unit adapted to store data shared between the central processing unit and the second digital signal processing unit. The first dual port memory unit is adapted to store data shared between the first and second digital signal processing units without using the central processing unit. | 08-18-2011 |
20110271079 | MULTIPLE-CORE PROCESSOR SUPPORTING MULTIPLE INSTRUCTION SET ARCHITECTURES - A multiple-core processor supporting multiple instruction set architectures provides a power-efficient and flexible platform for virtual machine environments requiring multiple support for multiple instruction set architectures (ISAs). The processor includes multiple cores having disparate native ISAs and that may be selectively enabled for operation, so that power is conserved when support for a particular ISA is not required of the processor. The multiple cores may share a common first level cache and be mutually-exclusively selected for operation, or multiple level-one caches may be provided, one associated with each of the cores and the cores operated as needed, including simultaneous execution of disparate ISAs. A hypervisor controls operation of the cores and locates a core and enables it if necessary when a request to instantiate a virtual machine having a specified ISA is received. | 11-03-2011 |
20110314258 | METHOD AND APPARATUS FOR OPERATING A PROGRAMMABLE LOGIC CONTROLLER (PLC) WITH DECENTRALIZED, AUTONOMOUS SEQUENCE CONTROL - A method for operating a programmable logic controller (PLC), and a programmable logic controller (PLC) for a processing plant with a central data processing unit and a sequence control that reads in, processes input data from inputs, and outputs the processed output data to outputs. The data processing unit performs only superordinate administrative functions for the administration of downstream input and output modules and is embodied as an ADMIN data processing unit. The sequence control is embodied as a partial application autonomously executing in the input and output modules. | 12-22-2011 |
20120079237 | Saving Values Corresponding to Parameters Passed Between Microcode Callers and Microcode Subroutines from Microcode Alias Locations to a Destination Storage Location - An apparatus of one aspect includes a microcode storage, a microcode subroutine stored in the microcode storage, and a microcode caller of the microcode subroutine stored in the microcode storage. The microcode caller has a save microinstruction that indicates a destination storage location. The apparatus also includes microcode alias locations. Each of the microcode alias locations is operable to store a value. The value in the microcode alias location corresponds to a parameter passed between the microcode caller and the microcode subroutine. The apparatus includes save logic coupled with the microcode alias locations to receive the values from the microcode alias locations. The save logic is operable, responsive to the save microinstruction, to save the values from the microcode alias locations in the destination storage location indicated by the save microinstruction. | 03-29-2012 |
20120079238 | DATA PROCESSING DEVICE - A microcomputer provided on a rectangular semiconductor board has memory interface circuits. The memory interface circuits are separately disposed in such positions as to extend along the peripheries of the semiconductor board on both sides from one corner as a reference position. In this case, limitations to size reduction imposed on the semiconductor board can be reduced compared with a semiconductor board having memory interface circuits only on one side. Respective partial circuits on each of the separated memory interface circuits have equal data units associated with data and data strobe signals. Thus, the microcomputer has simplified line design on a mother board and on a module board. | 03-29-2012 |
20120173846 | METHOD TO REDUCE THE ENERGY COST OF NETWORK-ON-CHIP SYSTEMS - In a network-on-chip (NoC) system, multiple data messages may be transferred among modules of the system. Power consumption due to the transfer of the messages may affect a cost and overall performance of the system. A described technique provides a way to reduce a volume of data transferred in the NoC system by exploiting redundancy of data messages. Thus, if a data message to be sent from a source in the NoC includes so-called “zero” bytes that are bytes including only bits set to “0,” such zero bytes may not be transmitted in the NoC. Information on whether each byte of the data message is a zero byte may be recorded in a storage such as a data structure. This information, together with non-zero bytes of the data message, may form a compressed version of the data message. The information may then be used to uncompress the compressed data message at a destination. | 07-05-2012 |
20120173847 | PARALLEL PROCESSOR AND METHOD FOR THREAD PROCESSING THEREOF - A parallel processor and a method for concurrently processing threads in the parallel processor are disclosed. The parallel processor comprises: a plurality of thread processing engines for processing threads distributed to the thread processing engines, and the plurality of thread processing engines being connected in parallel; a thread management unit for obtaining, judging the statuses of the plurality of thread processing engines, and distributing the threads in a waiting queue among the plurality of thread processing engines. | 07-05-2012 |
20120204003 | Invoking Multi-Library Applications on a Multiple Processor System - A mechanism is provided for invoking multi-library application on a multiple processor system, wherein the multiple processor system comprises a Power Processing Element (PPE) and a plurality of Synergistic Processing Element (SPE). Applications including multi-libraries run in the memory of the PPEs. The mechanism comprises maintaining the status of each SPE in the applications running on the PPE, where there are SPE agents for capturing the instructions from the PPE in the SPEs that have been started. In response to a request for invoking a library, the PPE determines whether the number of available SPEs for invoking the library is adequate based on the current status of SPEs. If the number of available SPEs is adequate, the PPE sends a run instruction to selected SPEs. After finishing the invocation of all libraries, the PPE sends termination instructions to all started SPEs. | 08-09-2012 |
20120265965 | PROCESSING BYPASS DIRECTORY TRACKING SYSTEM AND METHOD - A processing bypass directory system and method are disclosed. In one embodiment, a bypass directory tracking process includes setting bits in a bypass directory when a corresponding architectural register is written. The bits are selectively cleared in the bypass directory each cycle. The configuration of the bits is utilized to determine which stage of a bypass path processing information is at. | 10-18-2012 |
20120297165 | Electronic Device and Method for Data Processing Using Virtual Register Mode - The invention relates to an electronic device for data processing, which includes an execution unit with a temporary register, a register file, a first feedback path from the data output of the execution unit to the register file, a second feedback path from the data output of the execution unit to the temporary register, a switch configured to connect the first feedback path and/or the second feedback path, and a logic stage coupled to control the switch. The control stage is configured to control the switch to connect the second feedback path if the data output of an execution unit is used as an operand in the subsequent operation of an execution unit. | 11-22-2012 |
20120297166 | STACK PROCESSOR USING A FERROELECTRIC RANDOM ACCESS MEMORY (F-RAM) HAVING AN INSTRUCTION SET OPTIMIZED TO MINIMIZE MEMORY FETCH OPERATIONS - A stack processor using a non-volatile, ferroelectric random access memory (F-RAM) for both code and data space. The stack processor is operative in response to as many as 64 possible instructions based upon a 16 bit word. Each of the instructions in the 16 bit word comprises 3 five bit instructions and a 16 | 11-22-2012 |
20130086357 | STAGGERED READ OPERATIONS FOR MULTIPLE OPERAND INSTRUCTIONS - A central processing unit includes a register file having a plurality of read ports, a first execution unit having a first plurality of input ports, and logic operable to selectively couple different arrangements of the read ports to the input ports. A method for reading operands from a register file having a plurality of read ports by a first execution unit having a first plurality of input ports includes scheduling an instruction for execution by the first execution unit and selectively coupling a particular arrangement of the read ports to the input ports based on a type of the instruction. | 04-04-2013 |
20130219149 | OPERAND SPECIAL CASE HANDLING FOR MULTI-LANE PROCESSING - A single instruction multiple data processing pipeline | 08-22-2013 |
20130262818 | DNA COMPUTING - This invention deals generally with DNA-based microprocessors. In an exemplary embodiment of the invention, a DNA lattice or grid with photoreceptors forms a microprocessor and is configured to perform the functions of a series of logic gates. An input signal is supplied to the DNA lattice by shining a light signal on the lattice. The lattice performs the functions of the series of logic gates that are placed on the lattice. The lattice, in turn, supplies an augmented light output signal, which is decoded to reflect the processing by the DNA-based microprocessor. | 10-03-2013 |
20130305012 | IMPLEMENTATION OF COUNTERS USING TRACE HARDWARE - A multi-core computing system includes a plurality of processor cores, a counter, and a register block including a plurality of event registers coupled to the plurality of processor cores. Each of the plurality of processor cores is configured to write event records to the event registers, and the register block is configured to generate a serialized event stream including event records written to the event registers. The system further includes an event stream processor configured to receive the serialized event stream, to analyze the serialized event stream to identify a counter update event record in the serialized event stream, and to update the counter in response to the counter update event record. | 11-14-2013 |
20130305013 | MICROPROCESSOR THAT MAKES 64-BIT GENERAL PURPOSE REGISTERS AVAILABLE IN MSR ADDRESS SPACE WHILE OPERATING IN NON-64-BIT MODE - A microprocessor includes hardware registers that instantiate the IA-32 Architecture EDX and EAX GPRs and hardware registers that instantiate the Intel 64 Architecture R8-R15 GPRs. The microprocessor associates with each of the R8-R15 GPRs a respective unique MSR address. In response to an IA-32 Architecture RDMSR instruction that specifies the respective unique MSR address of one of the R8-R15 GPRs, the microprocessor reads the contents of the hardware register that instantiates the specified one of the R8-R15 GPRs into the hardware registers that instantiate the EDX:EAX registers. In response to an IA-32 Architecture WRMSR instruction that specifies the respective unique MSR address of one of the R8-R15 GPRs, the microprocessor writes into the hardware register that instantiates the specified one of the R8-R15 GPRs the contents of the hardware registers that instantiate the EDX:EAX registers. The microprocessor does so even when operating in non-64-modes. | 11-14-2013 |
20130305014 | MICROPROCESSOR THAT ENABLES ARM ISA PROGRAM TO ACCESS 64-BIT GENERAL PURPOSE REGISTERS WRITTEN BY X86 ISA PROGRAM - A microprocessor includes hardware registers that instantiate the Intel 64 Architecture R8-R15 GPRs. The microprocessor associates with each of the R8-R15 GPRs a respective unique MSR address. The microprocessor also includes hardware registers that instantiate the ARM Architecture GPRs. In response to an ARM MRRC instruction that specifies the respective unique MSR address of one of the R8-R15 GPRs, the microprocessor reads the contents of the hardware register that instantiates the specified one of the R8-R15 GPRs into the hardware registers that instantiate two of the ARM GPRs registers. In response to an ARM MCRR instruction that specifies the respective unique MSR address of one of the R8-R15 GPRs, the microprocessor writes into the hardware register that instantiates the specified one of the R8-R15 GPRs the contents of the hardware registers that instantiate two of the ARM Architecture GPRs registers. The hardware registers may be shared by the two Architectures. | 11-14-2013 |
20140019716 | PLATEABLE DIFFUSION BARRIER TECHNIQUES - Techniques are disclosed for forming a directly plateable diffusion barrier within an interconnect structure to prevent diffusion of interconnect fill metal into surrounding dielectric material and lower metal layers. The barrier can be used in back-end interconnect metallization processes and, in an embodiment, renders a seed layer unnecessary. In accordance with various example embodiments, the barrier can be implemented, for instance, as: (1) a single layer of ruthenium silicide (RuSi | 01-16-2014 |
20140025926 | Scalable Room Temperature Quantum Information Processor - A quantum information processor (QIP) may include a plurality of quantum registers, each quantum register containing at least one nuclear spin and at least one localized electronic spin. At least some of the quantum registers may be coherently coupled to each other by a dark spin chain that includes a series of optically unaddressable spins. Each quantum register may be optically addressable, so that quantum information can be initialized and read out optically from each register, and moved from one register to another through the dark spin chain, though an adiabatic sequential swap or through free-fermion state transfer. A scalable architecture for the QIP may include an array of super-plaquettes, each super-plaquette including a lattice of individually optically addressable plaquettes coupled to each other through dark spin chains, and separately controllable by confined microwave fields so as to permit parallel operations. | 01-23-2014 |
20140040595 | SPACE EFFICIENT CHECKPOINT FACILITY AND TECHNIQUE FOR PROCESSOR WITH INTEGRALLY INDEXED REGISTER MAPPING AND FREE-LIST ARRAYS - A processor may efficiently implement register renaming and checkpoint repair even in instruction set architectures with large numbers of wide (bit-width) registers by (i) renaming all destination operand register targets, (ii) implementing free list and architectural-to-physical mapping table as a combined array storage with unitary (or common) read, write and checkpoint pointer indexing and (iiii) storing checkpoints as snapshots of the mapping table, rather than of actual register contents. In this way, uniformity (and timing simplicity) of the decode pipeline may be accentuated and architectural-to-physical mappings (or allocable mappings) may be efficiently shuttled between free-list, reorder buffer and mapping table stores in correspondence with instruction dispatch and completion as well as checkpoint creation, retirement and restoration. | 02-06-2014 |
20140082325 | INTELLIGENT ARCHITECTURE CREATOR - Systems and methods are disclosed to automatically generate a processor architecture for a custom integrated circuit (IC) described by a computer readable code. The IC has one or more timing and hardware constraints. The system extracts parameters defining the processor architecture from a static profile and a dynamic profile of the computer readable code; iteratively optimizes the processor architecture by changing one or more parameters until all timing and hardware constraints expressed as a cost function are met; and synthesizes the generated processor architecture into a computer readable description of the custom integrated circuit for semiconductor fabrication. | 03-20-2014 |
20140082326 | VEHICLE ELECTRONIC CONTROLLER - Some embodiments relate to a vehicle electronic controller having a microcomputer and a port expansion element, with reduced power consumption and radio noise. An MCU (microcomputer) performs determination processing that determines whether an output condition is established that is based on a signal that is input via a signal input port of the MCU. If the output condition is established, the MCU transmits a signal output instruction to a port expansion element via a communication port, and if not, the instruction is not transmitted. The port expansion element outputs a signal via a signal output port in response to an instruction from the MCU. The port expansion element automatically switches, depending on whether communication via the MCU is being suspended, between operation in a waiting mode in which the internal oscillation circuit is suspended, and operation in a normal mode in which the internal oscillation circuit is operated. | 03-20-2014 |
20140122835 | METHOD OF PLACEMENT AND ROUTING IN A RECONFIGURATION OF A DYNAMICALLY RECONFIGURABLE PROCESSOR - A method and system are provided for deriving a resultant compiled software code with increased compatibility for placement and routing of a dynamically reconfigurable processor. | 05-01-2014 |
20140173252 | SYSTEM-ON-CHIP DESIGN STRUCTURE AND METHOD - Aspects may include a method of designing a system-on-chip. The method may include receiving multiple processing modules, each representing in software one of multiple processing units of a system-on-chip. The method may further include modeling communications from one or more of the multiple processing modules as accesses to memory. The method may further include generating a coherent memory module associated with the multiple processing modules based on modeling the communications from the one or more of the multiple processing modules as accesses to memory. The coherent memory module may represent in software a coherent memory associated with the multiple processing units. | 06-19-2014 |
20140310504 | Systems and Methods for Flag Tracking in Move Elimination Operations - Systems and methods for flag tracking in data manipulation operations involving move elimination. An example processing system comprises a first data structure including a plurality of physical register values; a second data structure including a plurality of pointers referencing elements of the first data structure; a third data structure including a plurality of move elimination sets, each move elimination set comprising two or more bits representing two or more logical data registers, the third data structure further comprising at least one bit associated with each move elimination set, the at least one bit representing one or more logical flag registers; a fourth data structure including an identifier of a data register sharing an element of the first data structure with a flag register; and a move elimination logic configured to perform a move elimination operation. | 10-16-2014 |
20150301983 | OPTIMIZATION OF LOOPS AND DATA FLOW SECTIONS IN MULTI-CORE PROCESSOR ENVIRONMENT - The present invention relates to a method for compiling code for a multi-core processor, comprising: detecting and optimizing a loop, partitioning the loop into partitions executable and mappable on physical hardware with optimal instruction level parallelism, optimizing the loop iterations and/or loop counter for ideal mapping on hardware, chaining the loop partitions generating a list representing the execution sequence of the partitions. | 10-22-2015 |
20160041945 | INSTRUCTION AND LOGIC FOR STORE BROADCAST - A processor includes a core with locally-gated circuitry, a decode unit, a local power gate (LPG) coupled to the locally-gated circuitry, and an execution unit. The decode unit includes logic to decode a store broadcast instruction of a specified width. The LPG includes logic to selectively provide power to the locally-gated circuitry, activate power to a first portion of the locally-gated circuitry for execution of full cache-line memory operations, and deactivate power to a second portion of the locally-gated circuitry the locally-gated circuitry. The execution unit includes logic to execute, by the first portion of the locally-gated circuitry for execution of full cache-line memory operations, the store broadcast instruction, the store broadcast instruction to store data of the specified width to storage of the processor. | 02-11-2016 |
20160048394 | ISSUING INSTRUCTIONS TO MULTIPLE EXECUTION UNITS - The present invention discloses a single chip sequential processor comprising at least one ALU-Block wherein said sequential processor is capable of maintaining its op-codes while processing data such as to overcome the necessity of requiring a new instruction in every clock cycle. | 02-18-2016 |
20160085719 | PRESENTING PIPELINES OF MULTICORE PROCESSORS AS SEPARATE PROCESSOR CORES TO A PROGRAMMING FRAMEWORK - A data processing system comprising: a processor comprising a plurality of cores, each core comprising a first processing pipeline and a second processing pipeline, the second processing pipeline having a different architecture to the first processing pipeline; a framework configured to manage the processing resources of the data processing system including the processor; and an interface configured to present to the framework each of the processing pipelines as a core. | 03-24-2016 |
20160092214 | OPTIMIZING GROUPING OF INSTRUCTIONS - Embodiments include optimizing the grouping of instructions in a microprocessor. Aspects include receiving a first clump of instructions from a streaming buffer, pre-decoding each of instructions for select information and sending the instructions to an instruction queue. Aspects further include storing initial grouping information for the instructions in a local register, wherein the initial grouping information is based on the select information. Aspects further include updating the initial group information stored in the local register when additional pre-decode information becomes available and grouping the instructions that are ready to be dispatched into a dispatch group based on the grouping information stored in the local register. Aspects further include dispatching the dispatch group to an issue unit. | 03-31-2016 |