Patents - stay tuned to the technology

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


PROCESSING CONTROL

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
712233000 Branching (e.g., delayed branch, loop control, branch predict, interrupt) 586
712221000 Arithmetic operation instruction processing 328
712225000 Processing control for data transfer 249
712226000 Instruction modification based on condition 175
712227000 Specialized instruction processing in support of testing, debugging, emulation 172
712228000 Context preserving (e.g., context swapping, checkpointing, register windowing 116
712229000 Mode switch or change 77
712223000 Logic operation instruction processing 66
712245000 Processing sequence control (i.e., microsequencing) 40
712231000 Detecting end or completion of microprogram 3
20090037707Determining When a Set of Compute Nodes Participating in a Barrier Operation on a Parallel Computer are Ready to Exit the Barrier Operation - Methods, apparatus, and products are disclosed for determining when a set of compute nodes participating in a barrier operation on a parallel computer are ready to exit the barrier operation that includes, for each compute node in the set: initializing a barrier counter with no counter underflow interrupt; configuring, upon entering the barrier operation, the barrier counter with a value in dependence upon a number of compute nodes in the set; broadcasting, by a DMA engine on the compute node to each of the other compute nodes upon entering the barrier operation, a barrier control packet; receiving, by the DMA engine from each of the other compute nodes, a barrier control packet; modifying, by the DMA engine, the value for the barrier counter in dependence upon each of the received barrier control packets; exiting the barrier operation if the value for the barrier counter matches the exit value.02-05-2009
20090100255Guaranteed core access in a multiple core processing system - Exclusive access to a core or part of a core, or to multiple cores, but in any case less than all of the cores, of a multiple core processing system. The access can be requested by an instruction, or by a routine. Once granted, the access provides exclusive access to the core so that a program can be run which requires substantially uninterrupted access to the core.04-16-2009
20110066832Configurable Processor Module Accelerator Using A Programmable Logic Device - A configurable processor module accelerator using a programmable logic device is described. According to one embodiment, the accelerator module includes a circuit board having coupled thereto a first programmable logic device, a controller, and a first memory. The first programmable logic device has access to a bitstream which is stored in the first memory. Access to the bitstream by the first programmable logic device is controlled by the controller. The bitstream is capable of being instantiated in the first programmable logic device using programmable logic thereof to provide at least a transport interface for communication between the first programmable logic device and one or more other devices associated with the motherboard using the microprocessor interface.03-17-2011
712230000 Generating next microinstruction address 2
20100325402DATA PROCESSING DEVICE AND METHOD FOR EXECUTING OBFUSCATED PROGRAMS - A program is obfuscated by reordering its instructions. Original instruction addresses are mapped to target addresses in an irregular way, with position dependent address steps between the addresses of logically successive instructions. Preferably pseudo-random address steps are used, for example with address steps that have mutually opposite sign with equal frequency. The data processing device has an instruction flow control unit that updates instruction addresses according the position dependent address steps. The instruction flow control unit may comprise a circuit that contains secret information, which is not normally accessible from the outside, to control the updates. A lookup table may be used for example, with address steps, successor addresses or mapped address values. In an embodiment the mapping of original instruction addresses to target addresses may be visualized by means of a path (12-23-2010
20120124344LOOP PREDICTOR AND METHOD FOR INSTRUCTION FETCHING USING A LOOP PREDICTOR - A loop predictor and a method for instruction fetching using a loop predictor. A processor may include a loop predictor in addition to a primary branch predictor. A relatively common scenario in program execution is that a set of branches repeat over and over forming a loop. The loop may be detected based on a repeated pattern of access to a data structure used for branch prediction. Once a loop is detected and it may be determined whether the codes would stay in the loop for at least a duration sufficient to disable the branch prediction. On a determination that the detected loop is locked, a sequence of instruction addresses in one iteration of the detected loop may be captured in a buffer and the branch predictor may be turned off and a sequence of fetch instructions may be played from the buffer.05-17-2012
Entries
DocumentTitleDate
20080201557Security Message Authentication Instruction - A method, system and computer program product for computing a message authentication code for data in storage of a computing environment. An instruction specifies a unit of storage for which an authentication code is to be computed. An computing operation computes an authentication code for the unit of storage. A register is used for providing a cryptographic key for use in the computing to the authentication code. Further, the register may be used in a chaining operation.08-21-2008
20080201558PROCESSOR SYSTEM - A processor system according to an aspect of the present invention has a pipeline. The pipeline includes a cache memory, an instruction fetch buffer which stores commands, an execution module which requests data access to the cache memory, a tag memory which outputs information related to the data access of the execution module, and an arbitration circuit which arbitrates access to the cache memory based on entry information of the instruction fetch buffer and the information related to the data access from the tag memory.08-21-2008
20080209179Low-Impact Performance Sampling Within a Massively Parallel Computer - An apparatus, program product and method sample at different times nodes that are performing similar work. Performance data associated with first and second node subsets performing the similar work are sampled at different times, e.g., in a round-robin fashion, and in accordance with a given sampling rate. The performance data is analyzed. Nodes whose performance suffers as a result of a sampling operation may be identified and removed from a subsequent operation.08-28-2008
20080209180Emulation prevention byte removers for video decoder - An emulation prevention byte remover may include one or more of a first buffer, a second buffer, a checker, and a shifter. The first buffer may store first stream data. The second buffer may store second stream data. The checker may determine whether one or more emulation prevention bytes are included in the first, second, or first and second stream data. If the checker determines that the one or more emulation prevention bytes are included in the first, second, or first and second stream data, the checker may output a check signal. In response to the check signal, the shifter may remove at least one of the one or more emulation prevention bytes from the first, second, or first and second stream data. The shifter may generate output stream data based on the first, second, or first and second stream data.08-28-2008
20080209181Method and System for Automatic Generation of Processor Datapaths - Systems and method for automatically generating a set of shared processor datapaths from the description of the behavior of one or more ISA operations is presented. The operations may include, for example, the standard operations of a processor necessary to support an application language such as C or C++ on the ISA. Such operations, for example, may represent a configurable processor ISA. The operations may also include one or more extension operations defined by one or more designers. Thus, a description of the behaviors of the various standard and/or extension operations that compose the ISA of an instance of a standard or configurable processor is used to automatically generate a set of shared processor datapaths that implement the behavior of those operations. In addition, certain aspects may take one or more operations as well as one or more input semantics and either re-implement the input semantics automatically, or combine the input semantics with each other or with one or more other operations to automatically generate a new set of shared processor datapaths.08-28-2008
20080215855Execution unit for performing shuffle and other operations - In one embodiment, the present invention includes a method for receiving first and second data operands in a common execution unit and manipulating the operands responsive to an instruction to generate an output according to local control signals of a local controller of the execution unit. Various instruction types such as shuffle and shift operations may be performed in the common execution unit in a single cycle. Other embodiments are described and claimed.09-04-2008
20080215856METHODS FOR GENERATING CODE FOR AN ARCHITECTURE ENCODING AN EXTENDED REGISTER SPECIFICATION - There are provided methods and computer program products for generating code for an architecture encoding an extended register specification. A method for generating code for a fixed-width instruction set includes identifying a non-contiguous register specifier. The method further includes generating a fixed-width instruction word that includes the non-contiguous register specifier.09-04-2008
20080215857Method For Latest Producer Tracking In An Out-Of-Order Processor, And Applications Thereof - Methods for latest producer tracking in a processor. In one embodiment, the method includes the steps of (1) writing a physical register identification value in a first register rename map location specified by a first instruction, (2) writing a first in-register status value in a second register rename map location specified by the first instruction, (3) writing a producer tracking status value at a producer tracking map location specified by the physical register identification value, and (4) modifying, upon graduation of the first instruction, the first in-register status value only if the producer tracking map location stores the producer tracking status value written in step (3). Other methods are also presented.09-04-2008
20080215858PROCESSOR AND PROGRAM EXECUTION METHOD CAPABLE OF EFFICIENT PROGRAM EXECUTION - A processor for sequentially executing a plurality of programs using a plurality of register value groups stored in a memory that correspond one-to-one with the programs. The processor includes a plurality of register groups; a select/switch unit operable to select one of the plurality of register groups as an execution target register group on which a program execution is based, and to switch the selection target every time a first predetermined period elapses; a restoring unit operable to restore, every time the switching is performed, one of the register value groups into one of the register groups that is not selected as the execution target register group; a saving unit operable to save, prior to the restoring, register values in the register group targeted for restoring, by overwriting a register value group in the memory that corresponds to the register values; and a program execution unit operable to execute, every time the switching is performed, a program corresponding to a register value group in the execution target register group.09-04-2008
20080222395System and Method for Predictive Early Allocation of Stores in a Microprocessor - A system and method for predictive early allocation of stores in a microprocessor is presented. During instruction dispatch, an instruction dispatch unit retrieves an instruction from an instruction cache (Icache). When the retrieved instruction is an interruptible instruction, the instruction dispatch unit loads the interruptible instruction's instruction tag (IITAG) into an interruptible instruction tag register. A load store unit loads subsequent instruction information (instruction tag and store data) along with the interruptible instruction tag in a store data queue entry. Comparison logic receives a completing instruction tag from completion logic, and compares the completing instruction tag with the interruptible instruction tags included in the store data queue entries. In turn, deallocation logic deallocates those store data queue entries that include an interruptible instruction tag that matches the completing instruction tag.09-11-2008
20080222396Low Overhead Access to Shared On-Chip Hardware Accelerator With Memory-Based Interfaces - In one embodiment, a method is contemplated. Access to a hardware accelerator is requested by a user-privileged thread. Access to the hardware accelerator is granted to the user-privileged thread by a higher-privileged thread responsive to the requesting. One or more commands are communicated to the hardware accelerator by the user-privileged thread without intervention by higher-privileged threads and responsive to the grant of access. The one or more commands cause the hardware accelerator to perform one or more tasks. Computer readable media comprises instructions which, when executed, implement portions of the method are also contemplated in various embodiments, as is a hardware accelerator and a processor coupled to the hardware accelerator.09-11-2008
20080222397Hard Object: Hardware Protection for Software Objects - In accordance with one embodiment, additions to the standard computer microprocessor architecture hardware are disclosed comprising novel page table entry fields 09-11-2008
20080229079APPARATUS, SYSTEM, AND METHOD FOR MANAGING COMMANDS OF SOLID-STATE STORAGE USING BANK INTERLEAVE - An apparatus, system, and method are disclosed for efficiently managing commands in a solid-state storage device that includes a solid-state storage arranged in two or more banks. Each bank is separately accessible and includes two or more solid-state storage elements accessed in parallel by a storage input/output bus. The solid-state storage includes solid-state, non-volatile memory. The solid-state storage device includes a bank interleave that directs one or more commands to two or more queues, where the one or more commands are separated by command type into the queues. Each bank includes a set of queues in the bank interleave controller. Each set of queues includes a queue for each command type. The bank interleave controller coordinates among the banks execution of the commands stored in the queues, where a command of a first type executes on one bank while a command of a second type executes on a second bank.09-18-2008
20080235495Method and Apparatus for Counting Instruction and Memory Location Ranges - A method, apparatus, and computer instructions in a data processing system for processing instructions and monitoring accesses to memory location ranges. An instruction for execution is identified. A determination is made as to whether the instruction is within a contiguous range of instructions. Execution information relating to the instruction is identified if the instruction is within the contiguous range of instructions. With memory location accesses, an access to a memory location is identified. A determination of whether the memory location is within a contiguous range of memory locations is made. Access information is identified if the memory location is within the contiguous range of memory locations.09-25-2008
20080235496Methods and Apparatus for Dynamic Instruction Controlled Reconfigurable Register File - A scalable reconfigurable register file (SRRF) containing multiple register files, read and write multiplexer complexes, and a control unit operating in response to instructions is described. Multiple address configurations of the register files are supported by each instruction and different configurations are operable simultaneously during a single instruction execution. For example, with separate files of the size 32×32 supported configurations of 128×32 bit s, 64x64 bit s and 32×128 bit s can be in operation each cycle. Single width, double width, quad width operands are optimally supported without increasing the register file size and without increasing the number of register file read or write ports.09-25-2008
20080244236METHOD AND SYSTEM FOR COMPOSING STREAM PROCESSING APPLICATIONS ACCORDING TO A SEMANTIC DESCRIPTION OF A PROCESSING GOAL - A method for assembling a stream processing application, includes: inputting a plurality of data source descriptions, wherein each of the data source descriptions includes a graph pattern that semantically describes an output of a data source; inputting a plurality of component descriptions, wherein each of the component descriptions includes a graph pattern that semantically describes an input of a component and a graph pattern that semantically describes an output of the component; inputting a stream processing request, wherein the stream processing request includes a goal that is represented by a graph pattern that semantically describes a desired stream processing outcome; assembling a stream processing graph, wherein the stream processing graph includes at least one data source or at least one component that satisfies the desired processing outcome; and outputting the stream processing graph.10-02-2008
20080244237Compute unit with an internal bit FIFO circuit - A compute unit with an internal bit FIFO circuit includes at least one data register, a lookup table, a configuration register including FIFO base address, length and read/write mode fields for configuring a portion of the lookup table as a bit FIFO circuit and a read/write pointer register responsive to an instruction having a lookup table identification field, length of bits field and register extract/deposit field for selectively transferring in a single cycle between the FIFO circuit and the data register a bit field of specified length.10-02-2008
20080244238Stream processing accelerator - The present invention is a stream processing accelerator which includes multiple coupled processing elements which are interconnected through a shared file register and a set of global predicates. The stream processing accelerator has two modes: full-processor mode and circuit mode. In full-processor mode, a branch unit, an arithmetic logic unit and a memory unit work together as a regular processor. In circuit mode, each component acts like functional units with configurable interconnections.10-02-2008
20080244239Method and System for Autonomic Monitoring of Semaphore Operations in an Application - A method, an apparatus, and a computer program product in a data processing system are presented for using hardware assistance for gathering performance information that significantly reduces the overhead in gathering such information. Performance indicators are associated with instructions or memory locations, and processing of the performance indicators enables counting of events associated with execution of those instructions or events associated with accesses to those memory locations. The performance information that has been dynamically gathered from the assisting hardware is available to the software application during runtime in order to autonomically affect the behavior of the software application, particularly to enhance its performance. For example, the counted events may be used to autonomically collecting statistical information about the ability of a software application to successfully acquire a semaphore.10-02-2008
20080250231PROGRAM CODE CONVERSION APPARATUS, PROGRAM CODE CONVERSION METHOD AND RECORDING MEDIUM - A program conversion apparatus includes: a code analyzing section configured to analyze an A binary code executable in an A processor in order to convert the A binary code into a program code for a B processor; a instruction function extracting section configured to extract a predetermined instruction function for the B processor which corresponds to a predetermined instruction for the A processor obtained by the analysis performed by the code analyzing section; and a translator section configured to generate a source code for the B processor from the A binary code, by rewriting the predetermined instruction for the A processor to the predetermined instruction function extracted by the instruction function extracting section.10-09-2008
20080256341Data Processing Pipeline Selection - Strategies for automatically selecting the most appropriate processing pipeline (or runtime) for a particular data item are described. In one embodiment, a media playing application automatically selects the most appropriate media processing pipeline for a media data item from multiple available processing pipelines, or candidates. In this regard, the application makes this selection by utilizing heuristic techniques to identify which available pipeline provides the most enhanced playback experience to a user with respect to certain attributes such as supported playback features and security. These heuristic techniques can take one or more criteria into account and can be implemented in any suitable way. By way of example and not limitation, in one embodiment, a selection process is used wherein potential pipeline candidates are ordered and sequentially evaluated.10-16-2008
20080256342SCALABLE AND CONFIGURABLE EXECUTION PIPELINE - Optimizing pipeline handler execution. A method may be practiced in a computing environment including an execution pipeline. The method includes acts to optimize execution of handlers in the pipeline. The method includes receiving a payload object. Policy information about the payload object is referenced. The policy information includes at least one property value. Based on the policy information about the payload object, handlers are selected from among the pipeline to execute on the payload object. The policy information may be referenced by strategies. Handlers may be registered with the strategies to facilitate the strategies being used to select handlers.10-16-2008
20080263332Data Processing Apparatus and Method for Accelerating Execution Subgraphs - A data processing apparatus and method are provided for processing data under control of a program having program instructions including sequences of individual program instructions corresponding to computational subgraphs identified within the program. Each computational subgraph has a number of input operands and produces one or more output operands. The apparatus comprises an operand store for storing the input and output operands, and processing logic for executing individual program instructions from the program. Also provided is configurable accelerator logic which, in response to reaching an execution point within the program corresponding to a sequence of individual program instructions corresponding to a computational subgraph, evaluates one or more output functions associated with the computational subgraph. The evaluation of each output function generates an output operand for storing in the operand store, and each output operand corresponds to an output that would have been generated had the sequence of individual program instructions corresponding to the computational subgraph have been executed by the processing logic. Configuration storage stores a single look-up table (LUT) configuration for each output function, and for each output function to be evaluated, the accelerator logic is configured dependent on the associated single LUT configuration from the configuration storage, such that on receipt of the input operands of the computational subgraph, the accelerator logic will generate the output operand. This technique has been found to provide a particularly efficient accelerator logic for evaluating output functions associated with computational subgraphs.10-23-2008
20080263333DOCUMENT PROCESSING METHOD - The present invention discloses a method for processing document data to achieve document interoperation, and the method comprises: by an application, performing an operation on abstract unstructured information by issuing instruction(s) to a platform software; and by the said platform software, receiving the said instruction and performing the operation on storage data corresponded to the abstract unstructured information according to the said instruction; wherein said abstract unstructured information are independent of a way in which said storage data are stored.10-23-2008
20080270763Device and Method for Processing Instructions - A method and a device for processing instructions. The device includes a pipelined processor, an instruction memory unit and a register file, whereas the pipelined processor includes a write-back unit and an execution unit. The device is characterized by including a controller that is adapted to receive a first register group size information and a first register identification information that define a first group of source registers associated with a first instruction; and to determine an execution related operation of the first instruction in response to the first register group size information, the first register identification information, a second register group size information and a second register identification information. Whereas the second register group size information and the second register identification information define a second group of target registers associated with a second instruction. Whereas the second instruction is provided to the pipelined processor before the first instruction.10-30-2008
20080270764STATE MACHINE COMPRESSION - Compressing state transition instructions may achieve a reduction in the binary instruction footprint of a state machine. In certain embodiments, the compressed state transition instructions are used by state machine engines that use one or more caches in order to increase the speed at which the state machine engine can execute a state machine. In addition to reducing the instruction footprint, the use of compressed state transition instructions as discussed herein may also increase the cache hit rate of a cache-based state machine engine, resulting in an increase in performance.10-30-2008
20080270765DISPLAY INFORMATION VERIFICATION PROGRAM, METHOD AND APPARATUS - A display information verification method, when display data of financial data is generated from the financial data and scripts for the financial data, includes: searching the scripts for an arithmetic instruction to process a numeric value in the financial data or a conversion instruction to convert a character string included in the financial data; and judging whether or not the arithmetic instruction or the conversion instruction detected in the searching is an instruction considered to manipulate the financial data. Thus, it is possible to detect the instruction considered to manipulate data from the scripts, and to avoid display including the manipulation of the data. In addition, for example, by using information of the instruction, which is stored inside in advance and is allowed to be used, it is possible to detect only the arithmetic instruction or the conversion instruction, which is not allowed to be used.10-30-2008
20080270766Method To Reduce The Number Of Load Instructions Searched By Stores and Snoops In An Out-Of-Order Processor - A method for reducing the number of load instructions in the load reorder queue (LRQ) that are searched when a load instruction is executed by a processor, including dispatching the load instructions; inserting the load instructions in the LRQ in program order; clearing a load received data field; executing the load instructions; checking load reorder queue (LRQ) entries; re-executing the load instruction of the matching LRQ entry; continuing execution; getting the load data; setting the load received data field; comparing a load sequence number (LSQN) of each load instruction to a snoop_safe register contents; ANDing all the load received data bits if the LSQN is greater in magnitude to the snoop_safe; setting the snoop_safe register to the LSQN of the load instruction; searching the LRQ entry; and setting a load_peril_snoop register to the LRQ index value where the first load instruction younger to the snoop_safe was found.10-30-2008
20080276077Method To Reduce The Number Of Load Instructions Searched By Stores And Snoops In An Out-Of-Order Processor - A method for reducing the number of load instructions in the load reorder queue (LRQ) that are searched when a load instruction is executed by a processor, including dispatching the load instructions; inserting the load instructions in the LRQ in program order; clearing a load received data field; executing the load instructions; checking load reorder queue (LRQ) entries; re-executing the load instruction of the matching LRQ entry; continuing execution; getting the load data; setting the load received data field; comparing a load sequence number (LSQN) of each load instruction to a snoop_safe register contents; ANDing all the load received data bits if the LSQN is greater in magnitude to the snoop_safe; setting the snoop_safe register to the LSQN of the load instruction; searching the LRQ entry; and setting a load_peril_snoop register to the LRQ index value where the first load instruction younger to the snoop_safe was found.11-06-2008
20080282068HOST COMMAND EXECUTION ACCELERATION METHOD AND SYSTEM - The present invention sets forth an interface method and system for host acceleration between an electronic device and a host PC. The system comprises an acceleration unit for rapidly classifying a type of an host command then issuing a flag signal to a microprocessor. The microprocessor then executes corresponding actions according to the flag signal and the host command without parsing the host command for accelerating the data communication between the device and a host PC.11-13-2008
20080282069METHOD AND SYSTEM FOR DESIGNING A FLEXIBLE HARDWARE STATE MACHINE - Method and system for performing hardware tasks using a hardware state machine and a processor is provided. The method includes, setting a breakpoint for a state machine state; running the processor in a parallel mode with the state machine; passing control to the processor after a breakpoint condition is encountered; performing a task, wherein the processor performs the task which was meant to be performed by the state machine; and transferring control back to the state machine after the processor performs the task. The system includes an Application Specific Integrated Circuit (ASIC) with the state machine, and the processor.11-13-2008
20080301413Method of and apparatus and architecture for real time signal processing by switch-controlled programmable processor configuring and flexible pipeline and parallel processing - A new signal processor technique and apparatus combining microprocessor technology with switch fabric telecommunication technology to achieve a programmable processor architecture wherein the processor and the connections among its functional blocks are configured by software for each specific application by communication through a switch fabric in a dynamic, parallel and flexible fashion to achieve a reconfigurable pipeline, wherein the length of the pipeline stages and the order of the stages varies from time to time and from application to application, admirably handling the explosion of varieties of diverse signal processing needs in single devices such as handsets, set-top boxes and the like with unprecedented performance, cost and power savings, and with full application flexibility.12-04-2008
20080320283Programmable Data Processor for a Variable Length Encoder/Decoder - A data processing circuit has a programmable processor (1212-25-2008
20080320284VIRTUAL SERIAL-STREAM PROCESSOR - A virtual serial-stream processor or system consists of one or more data input ports, zero or more data output ports, zero or more virtual control ports, one or more virtual serial and stream processing cores, one or more virtual serial control processors, and memory. Virtual components are spread across multiple physical devices, multiple virtual processing cores implemented in one physical device, or some combination, as dictated by an application-specific design.12-25-2008
20090013155System and Method for Retiring Approximately Simultaneously a Group of Instructions in a Superscalar Microprocessor - An system and method for retiring instructions in a superscalar microprocessor which executes a program comprising a set of instructions having a predetermined program order, the retirement system for simultaneously retiring groups of instructions executed in or out of order by the microprocessor. The retirement system comprises a done block for monitoring the status of the instructions to determine which instruction or group of instructions have been executed, a retirement control block for determining whether each executed instruction is retirable, a temporary buffer for storing results of instructions executed out of program order, and a register array for storing retirable-instruction results. In addition, the retirement control block further controls the retiring of a group of instructions determined to be retirable, by simultaneously transferring their results from the temporary buffer to the register array, and retires instructions executed in order by storing their results directly in the register array. The method comprises the steps of monitoring the status of the instructions to determine which group of instructions have been executed, determining whether each executed instruction is retirable, storing results of instructions executed out of program order in a temporary buffer, storing retirable-instruction results in a register array and retiring a group of retirable instructions by simultaneously transferring their results from the temporary buffer to the register array, and retiring instructions executed in order by storing their results directly in the register array.01-08-2009
20090019266INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING SYSTEM - With respect to memory access instructions contained in an internal representation program, an information processing apparatus generates a load cache instruction, a cache hit judgment instruction, and a cache miss instruction that is executed in correspondence with a result of a judgment process performed according to the cache hit judgment instruction. In a case where the internal representation program contains a plurality of memory access instructions having a possibility of causing accesses to mutually the same cache line in a cache memory, the information processing apparatus generates a combine instruction instructing that judgment results of the judgment processes that are performed according to the cache hit judgment instruction should be combined into one judgment result. The information processing apparatus outputs an output program that contains these instructions that have been generated.01-15-2009
20090019267Method, System, and Apparatus for Dynamic Reconfiguration of Resources - A dynamic reconfiguration to include on-line addition, deletion, and replacement of individual modules of to support dynamic partitioning of a system, interconnect (link) reconfiguration, memory RAS to allow migration and mirroring without OS intervention, dynamic memory reinterleaving, CPU and socket migration, and support for global shared memory across partitions is described. To facilitate the on-line addition or deletion, the firmware is able to quiesce and de-quiesce the domain of interest so that many system resources, such as routing tables and address decoders, can be updated in what essentially appears to be an atomic operation to the software layer above the firmware.01-15-2009
20090031116THREE OPERAND INSTRUCTION EXTENSION FOR X86 ARCHITECTURE - A method and apparatus are contemplated for increasing the number of available instructions in an instruction set architecture. The new instructions extend the number of general-purpose registers and include three or more operands. A combination of an escape code field, an opcode field, an operation configuration field and an operation size field determines a unique new instruction operation. A source operand extension field includes bits to be combined with other fields in order to extend the number of source operand values for general-purpose registers.01-29-2009
20090031117SAME INSTRUCTION DIFFERENT OPERATION (SIDO) COMPUTER WITH SHORT INSTRUCTION AND PROVISION OF SENDING INSTRUCTION CODE THROUGH DATA - A same instruction different operation (SIDO) processor is disclosed in which the instruction control word is supplied using data bus as one operand and the data to be operated is supplied through another operand. Also disclosed is a method for the provision of operation-code along with data/operands using a short instruction word. With all the execution units working in parallel on multiple data operands, a variety of operations can be performed in parallel. This allows short instruction format and flexibility to dynamically program the processor on the fly by changing data/operand words, and supports basic integer operations using very simple and efficient hardware execution units.01-29-2009
20090037699APPARATUS AND METHOD FOR PROCESSING SEMICONDUCTOR - A semiconductor processing apparatus includes a tester for inspecting a semiconductor device, a display unit 02-05-2009
20090037700Method and system for reactively assigning computational threads of control between processors - A method and system for reactively assigning computational threads of control between processors provide a coordination model implemented by a software framework. The coordination model comprises five (5) entities which implement the three elements of a coordination model: 1) Behavior, 2) Data, 3) Container, 4) Source and 5) Processor. The invention decomposes an application into a cooperative collection of distributed and networked Behaviors, which are subsequently executed by Containers. A designer using this invention implements a Behavior for each logical stage of execution, which represents the core service-processing logic for that stage.02-05-2009
20090043995Handling Data Cache Misses Out-of-Order for Asynchronous Pipelines - An apparatus and method for handling data cache misses out-of-order for asynchronous pipelines are provided. The apparatus and method associates load tag (LTAG) identifiers with the load instructions and uses them to track the load instruction across multiple pipelines as an index into a load table data structure of a load target buffer. The load table is used to manage cache “hits” and “misses” and to aid in the recycling of data from the L2 cache. With cache misses, the LTAG indexed load table permits load data to recycle from the L2 cache in any order. When the load instruction issues and sees its corresponding entry in the load table marked as a “miss,” the effects of issuance of the load instruction are canceled and the load instruction is stored in the load table for future reissuing to the instruction pipeline when the required data is recycled.02-12-2009
20090049281MULTIMEDIA DECODING METHOD AND MULTIMEDIA DECODING APPARATUS BASED ON MULTI-CORE PROCESSOR - Provided are a multimedia decoding method and multimedia decoding apparatus based on a multi-core platform including a central processor and a plurality of operation processors. The multimedia decoding method includes performing a queue generation operation on input multimedia data to generate queues of one or more operations of the multimedia data which are to be performed by the central processor and the operation processors, wherein the queue generation operation is performed by the central processor; performing motion compensation operations on partitioned data regions of the multimedia data by one or more motion compensation processors among the operation processors; and performing a deblocking operation on the multimedia data by a deblocking processor among the operation processors.02-19-2009
20090063825SYSTEMS AND METHODS FOR COMPRESSING STATE MACHINE INSTRUCTIONS - Systems and methods for compressing state machine instructions are disclosed herein. In one embodiment, the method comprises associating input characters associated with states to respective indices, where each index comprises information indicative of a particular transition instruction.03-05-2009
20090063826Quad aware Locking Primitive - A method and computer system for efficiently handling high contention locking in a multiprocessor computer system. At least some of the processors in the system are organized into a hierarchy, and process an interruptible lock in response to the hierarchy. The method utilizes two alternative methods of acquiring the lock, including a conditional lock acquisition primitive and an unconditional lock acquisition primitive, and an unconditional lock release primitive for releasing the lock from a particular processor. To prevent races between processors requesting a lock acquisition and a processor releasing the lock, a release flag is utilized. Furthermore, in order to ensure that the a processor utilizing the unconditional lock acquisition primitive is granted the lock, a handoff flag is utilized.03-05-2009
20090070563CONCURRENT PHYSICAL PROCESSOR REASSIGNMENT - Reassignment of a physical processor backing a logical processor is performed concurrently to the operation of the processor. The operating state of one physical processor is loaded on another physical processor, such that the logical processor is backed by a different physical processor. This reassignment is performed concurrent to processor operation and transparent to the operating system.03-12-2009
20090077351Information processing device and compiler - Devices, compilers and methods to reduce energy consumption associated with execution of a program by adjusting a computational capability of a CPU with higher accuracy than before. A device sets an appropriate computational capability to the CPU. It includes: changing a computational capability of the CPU every time each of a plurality of program areas included in the execution program is executed while the execution program is being executed, and measuring execution time each of the program areas; deciding an optimal computational capability required to execute the program area using the CPU, based on the execution time for each of the computational capabilities measured for the respective program areas; and performing setting of the optimal computational capability for executing the program area, which is to be used when executing the program area again in the course of executing the execution program, for each of the program areas.03-19-2009
20090077352PERFORMANCE OF AN IN-ORDER PROCESSOR BY NO LONGER REQUIRING A UNIFORM COMPLETION POINT ACROSS DIFFERENT EXECUTION PIPELINES - A method, system and processor for improving the performance of an in-order processor. A processor may include an execution unit with an execution pipeline that includes a backup pipeline and a regular pipeline. The backup pipeline may store a copy of the instructions issued to the regular pipeline. The execution pipeline may include logic for allowing instructions to flow from the backup pipeline to the regular pipeline following the flushing of the instructions younger than the exception detected in the regular pipeline. By maintaining a backup copy of the instructions issued to the regular pipeline, instructions may not need to be flushed from separate execution pipelines and re-fetched. As a result, one may complete the results of the execution units to the architected state out of order thereby allowing the completion point to vary among the different execution pipelines.03-19-2009
20090083521Program illegiblizing device and method - A program obfuscating device for generating obfuscated program from which unauthorized analyzer cannot obtain confidential information easily. The program obfuscating device stores original program that contains authorized program instructions and confidential process instruction group containing confidential information that needs to be kept confidential, generates process instructions which, when executed in predetermined order, provide same result, with execution of last process instruction thereof, as the confidential process instruction group, inserts the process instructions into the original program at position between start of the original program and the confidential process instruction group so as to be executed in the predetermined order, in place of the confidential process instruction group, generates dummy block as dummy of the process instructions, and inserts the dummy block and control instruction, which causes the dummy block to be bypassed, into the original program, and inserts branch instruction into the dummy block.03-26-2009
20090083522Systems, Devices, and/or Methods for Managing Programmable Logic Controller Processing - Certain exemplary embodiments can provide a programmable logic controller, which can comprise a Reduced Instruction Set Computer (RISC) processor. The RISC processor can be adapted to, responsive to a received request to process a Boolean operation, execute a single processor data access instruction addressed to a region of a memory-mapped register corresponding to the Boolean operation.03-26-2009
20090083523PROCESSOR POWER MANAGEMENT ASSOCIATED WITH WORKLOADS - Some embodiments provide determination of a processor performance characteristic associated with a first workload, and determination of a processor performance state for the first workload based on the performance characteristic. Further aspects may include determination of a second processor performance characteristic associated with a second workload, determination of a second processor performance state for the second workload based on the performance characteristic, determination of a similarity between the first performance characteristics and the second performance characteristics, determination of a cluster comprising the first workload and the second workload, and association of a third processor performance state with the cluster, wherein the third processor performance state is identical to the first processor performance state and to the second processor performance state.03-26-2009
20090089553MULTI-THREADED PROCESSING - A system includes a multi-threaded processor that executes an instruction of a process of an executing program. The multi-threaded processor includes at least a first and a second thread. First and second sets of source registers are respectively allocated to the first and second threads, and first and second sets of destination registers are respectively allocated to the first and second threads. A resource prefix configuration register includes mappings between each of the source and destination registers and the threads. The multi-threaded processor, during execution of the instruction by one of the first or the second threads of execution, accesses the source and destination registers based on the mapping, wherein at least one of the accessed registers is allocated to the other of the first or the second thread of execution.04-02-2009
20090089554METHOD FOR TUNING CHIPSET PARAMETERS TO ACHIEVE OPTIMAL PERFORMANCE UNDER VARYING WORKLOAD TYPES - A method, system, and computer program product for tuning a set of chipset parameters to achieve optimal chipset performance under varying workload characteristics. A set of workload characteristics of a current workload type is determined. An instruction stream is generated using weighted parameters derived from the set of workload characteristics of the current workload type. A set of chipset parameters is generated and integrated within the instruction stream. The instruction stream is loaded to one or more processors and executed to collect and analyze performance data relating to the chipset's performance. The analysis includes comparing the set of performance data of a plurality of different instruction streams having the same set of workload characteristics. Each executed instruction stream is executed with at least one different combination of chipset parameters. A determination is made regarding which combination of chipset parameters provides the best performance data for the current workload.04-02-2009
20090089555Methods and apparatus for executing or converting real-time instructions - In one embodiment, a computer processor is configured to execute a plurality of instructions defined by an instruction set including at least one real-time instruction. Each of the at least one real-time instruction specifies an execution timing of a respective one of the at least one real-time instruction. Each execution timing is tied to a common real-time measurement system. Other embodiments are also described.04-02-2009
20090106535SHARED PROCESSOR ARCHITECTURE APPLIED TO FUNCTIONAL STAGES CONFIGURED IN A RECEIVER SYSTEM FOR PROCESSING SIGNALS FROM DIFFERENT TRANSMITTER SYSTEMS AND METHOD THEREOF - According to an embodiment of the present invention, a shared processor architecture in a receiver system is disclosed. The receiver system is configured to have a first functional stage and a second functional stage for processing information carried by signals from a first transmitter system and a second transmitter system respectively. The first functional stage and the second functional stage correspond to an identical signal processing function. The shared processor architecture includes a first processor, allocated to the first functional stage and the second functional stage, for processing an output generated from the first functional stage or an output from the second functional stage.04-23-2009
20090106536Processor for executing group extract instructions requiring wide operands - A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.04-23-2009
20090106537PROCESSOR SUPPORTING VECTOR MODE EXECUTION - An improved superscalar processor. The processor includes multiple lanes, allowing multiple instructions in a bundle to be executed in parallel. In vector mode, the parallel lanes may be used to execute multiple instances of a bundle, representing multiple iterations of the bundle in a vector run. Scheduling logic determines whether, for each bundle, multiple instances can be executed in parallel. If multiple instances can be executed in parallel, coupling circuitry couples an instance of the bundle from one lane into one or more other lanes. In each lane, register addresses are renamed to ensure proper execution of the bundles in the vector run. Additionally, the processor may include a register bank separate from the architectural register file. Renaming logic can generate addresses to this separate register bank that are longer than used to address architectural registers, allowing longer vectors and more efficient processor operation.04-23-2009
20090113183METHOD OF CONTROLLING A DEVICE AND A DEVICE CONTROLLED THEREBY - A method of controlling at least one device is disclosed. The method includes providing the device with at least one constraint for carrying out an operation. The device determines if the constraint can be met. If it is determined that the constraint can be met, the device determines on its own accord a manner to get into a state wherein the constraint will be met. The device then goes into the state in the determined manner. A device that is controlled by the method and a system including such a device are also disclosed.04-30-2009
20090113184Method, Apparatus, and Program for Pinning Internal Slack Nodes to Improve Instruction Scheduling - A scheduling algorithm is provided for selecting the placement of instructions with internal slack into a schedule of instructions within a loop. The algorithm achieves this by pinning nodes with internal slack to corresponding nodes on the critical path of the code that have similar properties in terms of the data dependency graph, such as earliest time and latest time. The effect is that nodes with internal slack are more often optimally placed in the schedule, reducing the need for rotating registers or register copy instructions. The benefit of the present invention can primarily be seen when performing instruction scheduling or software pipelining on loop code, but can also apply to other forms of instruction scheduling when greater control of placement of nodes with internal slack is desired.04-30-2009
20090125705DATA PROCESSING METHOD AND APPARATUS WITH RESPECT TO SCALABILITY OF PARALLEL COMPUTER SYSTEMS - A data processing method for scalability of a parallel computer system includes: obtaining a processing time τ(p) that is the longest processing time in a case where a parallel processing is carried out by p processors and a processing time γ05-14-2009
20090132792Method of generating internode timing diagrams for a multiprocessor array - The apparatus used includes a multi core computer processor 05-21-2009
20090138682Dynamic instruction execution based on transaction priority tagging - A method, system and program are provided for dynamically assigning priority values to instruction threads in a computer system based on one or more predetermined thread performance tests, and using the assigned instruction priorities to determine how resources are used in the system. By storing the assigning priority values for each thread as a tag in the thread's instructions, tagged instructions from different threads that are dispatched through the system are allocated system resources based on the tagged priority values assigned to the respective instruction threads. Priority values for individual threads may be updated with control software which tests thread performance and uses the test results to apply predetermined adjustment policies. The test results may be used to optimize the workload allocation of system resources by dynamically assigning thread priority values to individual threads using any desired policy, such as achieving thread execution balance relative to thresholds and to performance of other threads, reducing thread response time, lowering power consumption, etc.05-28-2009
20090138683Dynamic instruction execution using distributed transaction priority registers - A method, system and program are provided for dynamically assigning priority values to instruction threads in a computer system based on one or more predetermined thread performance tests, and using the assigned instruction priorities to determine how resources are used in the system. By storing the assigning priority values in thread priority registers distributed throughout the computer system, instructions from different threads that are dispatched through the system are allocated system resources based on the priority values assigned to the respective instruction threads. Priority values for individual threads may be updated with control software which tests thread performance and uses the test results to apply predetermined adjustment policies. The test results may be used to optimize the workload allocation of system resources by dynamically assigning thread priority values to individual threads using any desired policy, such as achieving thread execution balance relative to thresholds and to performance of other threads, reducing thread response time, lowering power consumption, etc.05-28-2009
20090138684H.264 CAVLC DECODING METHOD BASED ON APPLICATION-SPECIFIC INSTRUCTION-SET PROCESSOR - Provided is an H.264 Context Adaptive Variable Length Coding (CAVLC) decoding method based on an Application-Specific Instruction-set Processor (ASIP). The H.264 CAVLC decoding method includes determining a plurality of comparison bit strings on the basis of a table of a decoding coefficient, storing lengths of the comparison bit strings in a first register, storing code values of the comparison bit strings in a second register, comparing an input bit stream with the comparison bit strings based on the lengths and code values of the comparison bit strings, and determining value of the decoding coefficient according to a result of comparison between the input bit stream and the comparison bit strings. The method extracts a decoding coefficient using a register in an ASIP without accessing a memory and prevents a reduction in speed caused by memory access, thereby increasing the decoding speed of an H.264 decoder.05-28-2009
20090138685Processor for processing instruction set of plurality of instructions packed into single code - A conversion table converts a packed instruction (pre-conversion code) contained in the instruction code fetched from an instruction memory into a plurality of instruction codes (converted codes). An instruction decoder decodes the plurality of the instruction codes converted by a conversion table. A plurality of ALUs perform the operation in accordance with the decoding result of the instruction decoder. Therefore, the number of instructions that can be executed in parallel per cycle may be increased while at the same time the capacity of the instruction memory is reduced.05-28-2009
20090138686METHOD FOR PROCESSING A GRAPH CONTAINING A SET OF NODES - The invention relates to a computerized method for processing a graph containing a set of nodes processing a graph containing a set of nodes, wherein forest of trees is provided corresponding to a directed acyclic graph containing a set of nodes, each of said nodes having a type chosen from a set of types; a depth for each node in said forest of trees is determined; in a breadth-first traversal manner, the depth and type of each node in said forest of trees is compared to a predefined matrix, said matrix defining for each depth and type combination one of the following actions to be carried out: no action, creating a new sub-tree, triggering exception handling.05-28-2009
20090158010Command Protocol for Integrated Circuits - A method of operating an integrated circuit involves supplying an instruction portion of a command to the integrated circuit to specify an operation to be performed by the integrated circuit. At least some types of commands also include an attributes portion that provides additional information about the operation to be performed. The attributes portion of the command is supplied to the integrated circuit with a delay relative to the instruction portion of the command. The integrated circuit selectively enables circuitry for processing the attributes portion if the integrated circuit determines from the received instruction portion that the command also includes an attributes portion. The delay between the two portions of the command provides sufficient time for the integrated circuit to enable the attributes processing circuitry, which, in a default state, can be disabled during an active mode of the integrated circuit to save power.06-18-2009
20090158011DATA PROCESSING SYSTEM - A data processing system comprising a computer chip having a processing circuit and a chip-internal first memory and a chip-external second memory being coupled to the computer chip, wherein the processing circuit is configured to allow execution of computer programs stored in the first memory and to prevent execution of computer programs stored in the second memory when the data processing system is in a first state, and to allow execution of computer programs stored in the second memory when the data processing system is in a second state.06-18-2009
20090164761HIERARCHICAL SYSTEM AND METHOD FOR ANALYZING DATA STREAMS - A method for analyzing data streams comprises receiving a data stream, conducting a first analysis of the data stream for a possible target activity, and if a possible target activity is indicated generating a first alert. If the first alert is generated, a second analysis for the possible target activity is conducted to determine whether the target activity is indicated in the data stream with a high degree of certainty. If a possible target activity is indicated by the second analysis, a second alert is generated and provided to an external system for action.06-25-2009
20090172361COMPLETION CONTINUE ON THREAD SWITCH MECHANISM FOR A MICROPROCESSOR - A thread switch mechanism and technique for a microprocessor is disclosed wherein a processing of a first thread is completed, and a continuation of a second thread is initiated during completion of the first thread. In one form, the technique includes processing a first thread at a pipeline of a processing device, and initiating processing of a second thread at a front end of the pipeline in response to an occurrence of a context switch event. The technique can also include initiating a instruction progress metric in response the context switch event. The technique can further include enabling completion of processing of instructions of the first thread that are at a back end of the pipeline at the occurrence of the context switch event until an expiry of the instruction progress metric.07-02-2009
20090172362PROCESSING PIPELINE HAVING STAGE-SPECIFIC THREAD SELECTION AND METHOD THEREOF - One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.07-02-2009
20090182989MULTITHREADED MICROPROCESSOR WITH REGISTER ALLOCATION BASED ON NUMBER OF ACTIVE THREADS - A mechanism in a multithreaded processor to allocate resources based on configuration information indicating how many threads are in use.07-16-2009
20090187745INFORMATION PROCESSING SYSTEM AND METHOD OF EXECUTING FIRMWARE - An information processing system includes a control central processing unit a memory; and a stream interface configured to receive an input stream including data to be processed and to transfer the input stream to the memory. A download process in which the stream interface receives stream data including firmware and stores the stream data in the memory is executed in response to an instruction from the control central processing unit, and the control central processing unit analyzes the stream data stored in the memory to extract the firmware in the memory space and executes the firmware extracted in the memory space to process the data to be processed.07-23-2009
20090193234SEQUENCER CONTROLLED SYSTEM AND METHOD FOR CONTROLLING TIMING OF OPERATIONS OF FUNCTIONAL UNITS - The invention proposes a simple method for controlling distributed functional units (FU) in a system. It offloads the main system processor from intermediate status monitoring. The sequencer controlled system comprises a plurality of functional units, a processor operatively coupled to the plurality of functional units through a bus, a sequencer having a set of registers, and an interrupt source register configured for interrupt polling. The registers are configured to control the timing of at least one operation of the functional units with stored instructions for each of the functional units. The processor sets up at least some of the registers through the bus for the initial configuration and the sequencer is activated by the processor.07-30-2009
20090198970METHOD AND STRUCTURE FOR ASYNCHRONOUS SKIP-AHEAD IN SYNCHRONOUS PIPELINES - An electronic apparatus includes a plurality of stages serially interconnected as a pipeline to perform sequential processings on input operands. A shortening circuit associated with at least one stage of the pipeline recognizes when one or more of input operands for the stage has been predetermined as appropriate for shortening and execute the shortening when appropriate.08-06-2009
20090198971Heterogeneous Processing Elements - A heterogeneous processing element model is provided where I/O devices look and act like processors. In order to be treated like a processor, an I/O processing element, or other special purpose processing element, must follow some rules and have some characteristics of a processor, such as address translation, security, interrupt handling, and exception processing, for example. The heterogeneous processing element model puts special purpose processing elements on the same playing field as processors, from a programming perspective, operating system perspective, power perspective, as the processors. The operating system can get work to a security engine, for example, in the same way it does to a processor.08-06-2009
20090198972Microprocessor systems - A microprocessor pipeline arrangement 1 includes a plurality of functional units 08-06-2009
20090210676System and Method for the Scheduling of Load Instructions Within a Group Priority Issue Schema for a Cascaded Pipeline - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions; (2) determine if at least one load instruction is in the issue group, if so scheduling the least one load instruction in a first pipeline based upon a priority list; and (3) schedule execution of the issue group of instructions in the cascaded delayed execution pipeline unit.08-20-2009
20090210677System and Method for Optimization Within a Group Priority Issue Schema for a Cascaded Pipeline - The present invention provides system and method for a group priority issue schema for a cascaded pipeline. The system includes a cascaded delayed execution pipeline unit having a plurality of execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The system further includes circuitry configured to: (1) receive an issue group of instructions, (2) determine a stall penalty of all the instructions in the issue group, (3) schedule the instructions in an order of the longest stall penalty to shortest stall penalty, and (4) execute the issue group of instructions in the cascaded delayed execution pipeline unit.08-20-2009
20090217007DISCOVERY OF A VIRTUAL TOPOLOGY IN A MULTI-TASKING MULTI-PROCESSOR ENVIRONMENT - A computer program product, apparatus and method for identifying processors in a multi-tasking multiprocessor network, the computer program product including a tangible storage medium readable by a processing circuit and storing instructions for execution by the processing circuit for performing a method including storing a service record for a port to which an LID has been assigned, retrieving service records for nodes to which channel paths may connect, retrieving path records that provide address destinations for the nodes identified in the service records, initiating channel initialization for the channel paths defined for the port and removing the service record for the port.08-27-2009
20090235056RECORDING MEDIUM STORING PERFORMANCE MONITORING PROGRAM, PERFORMANCE MONITORING METHOD, AND PERFORMANCE MONITORING DEVICE - A performance monitoring device has an interrupt detection unit that detects generation of an interrupt process to be executed by a processor in accordance with TLB entry invalidation executed in an operating system. A counter value acquisition unit acquires a counter value of a predetermined event counted by the processor when the interrupt process is detected by the interrupt detection unit. A process information acquisition unit acquires identification information for identifying a process executed on the processor from the operating system immediately before the interrupt process is detected by the interrupt detection unit. An associating unit associates the counter value acquired by the counter value acquisition unit during the interrupt process with the identification information acquired by the process information acquisition unit immediately before the interrupt process.09-17-2009
20090235057CACHE MEMORY CONTROL CIRCUIT AND PROCESSOR - A cache memory control circuit includes a selecting section configured to be capable of selecting, in a predetermined order, each way or a predetermined two or more ways of a cache memory having multiple ways; a comparing section configured to detect a cache hit in each way; and a control section configured to, upon detection of a cache hit, stop a selection of the respective ways or the predetermined two or more ways at the selecting section.09-17-2009
20090240921METHOD, SYSTEM AND COMPUTER PROGRAM PRODUCT FOR SUPPORTING PARTIAL RECYCLE IN A PIPELINED MICROPROCESSOR - A computer processing system is provided. The computer processing system includes a first datastore that stores a subset of information associated with an instruction. A first stage of a processor pipeline writes the subset of information to the first datastore based on an execution of an operation associated with the instruction. A second stage of the pipeline initiates reprocessing of the operation associated with the instruction based on the subset of information stored in the first datastore.09-24-2009
20090240922METHOD, SYSTEM, COMPUTER PROGRAM PRODUCT, AND HARDWARE PRODUCT FOR IMPLEMENTING RESULT FORWARDING BETWEEN DIFFERENTLY SIZED OPERANDS IN A SUPERSCALAR PROCESSOR - Result and operand forwarding is provided between differently sized operands in a superscalar processor by grouping a first set of instructions for operand forwarding, and grouping a second set of instructions for result forwarding, the first set of instructions comprising a first source instruction having a first operand and a first dependent instruction having a second operand, the first dependent instruction depending from the first source instruction; the second set of instructions comprising a second source instruction having a third operand and a second dependent instruction having a fourth operand, the second dependent instruction depending from the second source instruction, performing operand forwarding by forwarding the first operand, either whole or in part, as it is being read to the first dependent instruction prior to execution; performing result forwarding by forwarding a result of the second source instruction, either whole or in part, to the second dependent instruction, after execution; wherein the operand forwarding is performed by executing the first source instruction together with the first dependent instruction; and wherein the result forwarding is performed by executing the second source instruction together with the second dependent instruction.09-24-2009
20090240923Computing Device with Entry Authentication into Trusted Execution Environment and Method Therefor - A computing device (09-24-2009
20090240924INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING METHOD, AND COMPUTER PRODUCT - An information processing device disclosed includes a plurality of executing units for executing various processes. The information processing device and method thereof acquire setting information that indicates an operating condition with respect to each executing unit from information an operation of a main process executed by the plurality of executing units, and sets an operating state of each of the executing units based on the acquired setting information.09-24-2009
20090249036EFFICIENT METHOD AND APPARATUS FOR EMPLOYING A MICRO-OP CACHE IN A PROCESSOR - Methods and apparatus for using micro-op caches in processors are disclosed. A tag match for an instruction pointer retrieves a set of micro-op cache line access tuples having matching tags. The set is stored in a match queue. Line access tuples from the match queue are used to access cache lines in a micro-op cache data array to supply a micro-op queue. On a micro-op cache miss, a macroinstruction translation engine (MITE) decodes macroinstructions to supply the micro-op queue. Instruction pointers are stored in a miss queue for fetching macroinstructions from the MITE. The MITE may be disabled to conserve power when the miss queue is empty-likewise for the micro-op cache data array when the match queue is empty. Synchronization flags in the last micro-op from the micro-op cache on a subsequent micro-op cache miss indicate where micro-ops from the MITE merge with micro-ops from the micro-op cache.10-01-2009
20090249037Pipeline processors - A method and apparatus are provided for executing instructions from a plurality of instruction threads on a multi-threaded processor. The instruction threads may each include instructions of different complexity. A plurality of pipelines for executing instructions are provided and an instruction scheduler determines on each clock cycle the pipelines upon which instructions will be executed. Some of the pipelines are configured to appear to the instruction threads as single pipelines but in fact comprise two pipeline paths, one for executed instructions of lower complexity and the other. The instruction scheduler determines on which of the two pipeline paths an instruction should execute.10-01-2009
20090249038STREAM DATA PROCESSING APPARATUS - In a normal operation state, a connection management section writes data transmitted from a first processing section to a data temporary storage section and reads data to be received by a second processing section from the data temporary storage section. Upon receiving control signals which instruct a change of the subject of processing, the first processing section and the second processing section output a transmitting-end clear request and a receiving-end clear request, respectively. The connection management section reads data from the empty data storage section after a transmitting-end clear request is received and until a receiving-end clear request is received, and writes data to the empty data storage section after a receiving-end clear request is received and until a transmitting-end clear request is received.10-01-2009
20090259828EXECUTION OF RETARGETTED GRAPHICS PROCESSOR ACCELERATED CODE BY A GENERAL PURPOSE PROCESSOR - One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.10-15-2009
20090259829THREAD-LOCAL MEMORY REFERENCE PROMOTION FOR TRANSLATING CUDA CODE FOR EXECUTION BY A GENERAL PURPOSE PROCESSOR - One embodiment of the present invention sets forth a technique for translating application programs written using a parallel programming model for execution on multi-core graphics processing unit (GPU) for execution by general purpose central processing unit (CPU). Portions of the application program that rely on specific features of the multi-core GPU are converted by a translator for execution by a general purpose CPU. The application program is partitioned into regions of synchronization independent instructions. The instructions are classified as convergent or divergent and divergent memory references that are shared between regions are replicated. Thread loops are inserted to ensure correct sharing of memory between various threads during execution by the general purpose CPU.10-15-2009
20090265528PROGRAMMABLE STREAMING PROCESSOR WITH MIXED PRECISION INSTRUCTION EXECUTION - The disclosure relates to a programmable streaming processor that is capable of executing mixed-precision (e.g., full-precision, half-precision) instructions using different execution units. The various execution units are each capable of using graphics data to execute instructions at a particular precision level. An exemplary programmable shader processor includes a controller and multiple execution units. The controller is configured to receive an instruction for execution and to receive an indication of a data precision for execution of the instruction. The controller is also configured to receive a separate conversion instruction that, when executed, converts graphics data associated with the instruction to the indicated data precision. When operable, the controller selects one of the execution units based on the indicated data precision. The controller then causes the selected execution unit to execute the instruction with the indicated data precision using the graphics data associated with the instruction.10-22-2009
20090276609CONFIGURABLE PIPELINE BASED ON ERROR DETECTION MODE IN A DATA PROCESSING SYSTEM - A method includes providing a data processor having an instruction pipeline, where the instruction pipeline has a plurality of instruction pipeline stages, and where the plurality of instruction pipeline stages includes a first instruction pipeline stage and a second instruction pipeline stage. The method further includes providing a data processor instruction that causes the data processor to perform a first set of computational operations during execution of the data processor instruction, performing the first set of computational operations in the first instruction pipeline stage if the data processor instruction is being executed and a first mode has been selected, and performing the first set of computational operations in the second instruction pipeline stage if the data processor instruction is being executed and a second mode has been selected.11-05-2009
20090282222Dynamic Virtual Software Pipelining On A Network On Chip - A NOC for dynamic virtual software pipelining including IP blocks, routers, memory communications controllers, and network interface controllers, each IP block adapted to a router through a memory communications controller and a network interface controller, the NOC also including: a computer software application segmented into stages, each stage comprising a flexibly configurable module of computer program instructions identified by a stage ID, each stage assigned to a thread of execution on an IP block; and each stage executing on a thread of execution on an IP block, including a first stage executing on an IP block, producing output data and sending by the first stage the produced output data to a second stage, the output data including control information for the next stage and payload data; and the second stage consuming the produced output data in dependence upon the control information.11-12-2009
20090287909Dynamically Estimating Lifetime of a Semiconductor Device - In one embodiment, the present invention includes a method for obtaining dynamic operating parameter information of a semiconductor device such as a processor, determining dynamic usage of the device, either as a whole or for one or more portions thereof, based on the dynamic operating parameter information, and dynamically estimating a remaining lifetime of the device based on the dynamic usage. Depending on the estimated remaining lifetime, the device may be controlled in a desired manner. Other embodiments are described and claimed.11-19-2009
20090287910SYSTEM AND METHOD FOR PROVIDING PREPROGRAMMED IMAGING IN A DATA COLLECTION ENVIRONMENT - According to a particular embodiment, an imaging system is provided that includes an imaging device operable to image a hard-drive of a target device. The imaging device includes a first connection to the target device and a second connection to an output capture device, whereby both connections facilitate an information flow of data. The imaging device includes a preprogramming element that includes a stored version of preprogramming instructions, which direct the imaging device on how to image the hard-drive autonomously.11-19-2009
20090292903MICROPROCESSOR PROVIDING ISOLATED TIMERS AND COUNTERS FOR EXECUTION OF SECURE CODE - An apparatus providing for a secure execution environment is presented. The apparatus includes a microprocessor and a secure non-volatile memory. The a microprocessor is configured to execute non-secure application programs and a secure application program, where the non-secure application programs are accessed from a system memory via a system bus. The microprocessor has a plurality of timers which are visible and accessible only by the secure application program when executing in a secure execution mode. The secure non-volatile memory is coupled to the microprocessor via a private bus and is configured to store the secure application program. Transactions over the private bus between the microprocessor and the secure non-volatile memory are isolated from the system bus and corresponding system bus resources within the microprocessor.11-26-2009
20090300332NON-DESTRUCTIVE SIDEBAND READING OF PROCESSOR STATE INFORMATION - A processor receives a command via a sideband interface on the processor to read processor state information, e.g., CPUID information. The sideband interface provides the command information to a microcode engine in the processor that executes the command to retrieve the designated processor state information at an appropriate instruction boundary and retrieves the processor state information. That processor information is stored in local buffers in the sideband interface to avoid modifying processor state. After the microcode engine completes retrieval of the information and the sideband interface command is complete, execution returns to the normal flow in the processor. Thus, the processor state information may be obtained non-destructively during processor runtime.12-03-2009
20090300333HARDWARE SUPPORT FOR WORK QUEUE MANAGEMENT - The claimed matter provides systems and/or methods that effectuate utilization of fine-grained concurrency in parallel processing and efficient management of established memory structures. The system can include devices that establish memory structures associated with individual processors that can comprise a parallel processing phalanx. The system can thereafter utilize various enqueuing and/or dequeuing directives to add or remove work descriptors to or from the memory structures individually associated with each of the individual processors thereby providing improved work flow synchronization amongst the processors that comprise the parallel processing complex.12-03-2009
20090300334Method and Apparatus for Loading Data and Instructions Into a Computer - A computer array (12-03-2009
20090307465Computational expansion system - The invention relates to a system for expanding capacity for executing processes that are executed in a central processing unit (12-10-2009
20090307466Resource Sharing Techniques in a Parallel Processing Computing System - A method, apparatus, and program product share a resource in a computing system that includes a plurality of computing cores. A request from a second execution context (“EC”) to lock the resource currently locked by a first EC on a first core causes replication of the second EC as a third EC on a third core. The first and third ECs are executed substantially concurrently. When the first EC modifies the resource, the third EC is restarted after the resource has been modified. Alternately, a first EC is configured in a first core and shadowed as a second EC in a second core. In response to a blocked lock request, the first EC is halted and the second EC continues. After granting a lock, it is determined whether a conflict has occurred and the first and second EC are particularly synchronized to each other in response to that determination.12-10-2009
20090313457System and Method for Extracting Fields from Packets Having Fields Spread Over More Than One Register - Systems and methods that allow for extracting a field from data stored in a pair of registers using two instructions. A first instruction extracts any part of the field from a first register designated as a first source register, and executes a second instruction extracting any part of the field from a second general register designated as a second source register. The second instruction inserts any extracted field parts in a result register.12-17-2009
20090319758Processor, performance profiling apparatus, performance profiling method , and computer product - A processor capable of executing an arbitrary application program on an operating system includes an event context register that stores therein an ID of an event to be measured in the arbitrary application program and a context register that records therein an ID of an event executed by the arbitrary application program upon the application program being executed on the operating system. The processor further includes a comparator that compares the ID of the event recorded in the context register and the ID of the event to be measured that is stored in the event context register and an event counter that counts the number of times the ID of the event recorded in the context register and the ID of the event to be measured are determined to coincide by the comparator.12-24-2009
20090327663Power Aware Retirement - In one embodiment, the present invention includes a retirement unit to receive and retire executed instructions. The retirement unit may include a first array to receive information at allocation and a second array to receive information after execution. The retirement unit may further include logic to calculate an event associated with an executed instruction if information associated with the executed instruction is stored in an on-demand portion of at least one of arrays. Other embodiments are described and claimed.12-31-2009
20100005277Communicating Between Multiple Threads In A Processor - In one embodiment, the present invention includes a method for accessing registers associated with a first thread while executing a second thread. In one such embodiment a method may include preventing an instruction of a first thread that is to access a source operand from a register file of a second thread from executing if a synchronization indicator associated with the source operand indicates incompletion of a producer operation of the second thread, and executing the instruction if the synchronization indicator indicates completion of the producer operation of the second thread. Other embodiments are described and claimed.01-07-2010
20100005278DEVICE AND METHOD FOR CONTROLLING AN INTERNAL STATE OF INFORMATION PROCESSING EQUIPMENT - The state control device for controlling an internal state of information processing equipment includes a scenario table, an information recorder, an information player and a state change controller. The information recorder acquires sync information and one item or a plurality of items of state information from the information processing equipment and records the acquired sync information and state information in association with each other in the scenario table. The information player, receiving sync information from the information processing equipment, acquires state information associated with sync information corresponding to the received sync information, among the sync information stored in the scenario table, from the scenario table, and supplies the acquired state information to the state change controller. The state change controller controls the inside of the information processing equipment based on the state information received from the information player. The sync information is information for identifying an execution state of the information processing equipment at a given time point, and the state information is information representing an internal state of the information processing equipment at a given time point synchronous with the sync information.01-07-2010
20100011192SIMPLIFYING COMPLEX DATA STREAM PROBLEMS INVOLVING FEATURE EXTRACTION FROM NOISY DATA - Methods, systems and computer program products for simplifying complex data stream problems involving feature extraction from noisy data. Exemplary embodiments include a method for processing a data stream, including applying multiple operators to the data stream, wherein an operation by each of the multiple operators includes retrieving the next chunk for each of set of input parameters, performing digital processing operations on a respective next chunk, producing sets of output parameters and adding data to one or more internal data stores, each internal data store acting as a data stream source.01-14-2010
20100011193Selective Hardware Lock Disabling - Controlling a reorder buffer (ROB) to selectively perform functional hardware lock disabling (HLD) is described. One apparatus embodiment includes a unit to enable an ROB to selectively disable a lock upon Identifying a lock acquire operation (LAO) associated with a critical section (CS) entry point, a unit to selectively retire the LAO, a unit to cause the ROB to selectively disable the lock, and a unit to snoop a buffer. The apparatus may, based on the snooping, selectively abort a transaction associated with the CS.01-14-2010
20100023732OPTIMIZING NON-PREEMPTIBLE READ-COPY UPDATE FOR LOW-POWER USAGE BY AVOIDING UNNECESSARY WAKEUPS - A technique for low-power detection of a grace period following a shared data element update operation that affects non-preemptible data readers. A grace period processing action is implemented that requires a processor that may be running a non-preemptible reader of the shared data element to pass through a quiescent state before further grace period processing can proceed. A power status of the processor is also determined. Further grace period processing may proceed without requiring the processor to pass through a quiescent state if the power status indicates that quiescent state processing by the processor is unnecessary.01-28-2010
20100031007METHOD TO ACCELERATE NULL-TERMINATED STRING OPERATIONS - A method reads and compares first and second register values, each with a size of at least two bytes. A third register indicates a match if: (1) a byte in the first register value is equal to (or, alternatively, not equal to) a corresponding byte in the second register value, or (2) if a byte in the first register value is zero. Next, a fourth register value is set to one of the following: (1) a count of the matching byte, if the corresponding bytes in the first and second register values are equal (or, alternatively, are not equal), or (2) a number outside of a range between 0 and n−1, if the corresponding bytes in the first and second register values are not equal (or, alternatively, are equal). The value, n, is an integer equal to the number of bytes in the first and second register values.02-04-2010
20100031008PARALLEL SORTING APPARATUS, METHOD, AND PROGRAM - A parallel sorting apparatus is provided whose sorting processing is speeded up. A reference value calculation section calculates a plurality of reference values serving as boundaries of intervals used for allocating input data depending on the magnitude of a value. An input data aggregation section partitions the input data into a plurality of input data regions, and calculates, by parallel processing, mapping information used for allocating data in each of the partitioned input data regions to the plurality of intervals that have boundaries on the reference values calculated by the reference value calculation section. A data allocation section allocates, by parallel processing, data in each of the input data regions to the plurality of intervals in accordance with the mapping information calculated by the input data aggregation section. An interval sorting section individually sorts, by parallel processing, data in the plurality of intervals allocated by the data allocation section.02-04-2010
20100037038Dynamic Core Pool Management - Embodiments that dynamically manage core pools are disclosed. Various embodiments involve measuring the amount of a computational load on a computing device. One way of measuring the load may consist of executing a number of instructions, in a unit of time, with numerous cores of the computing device. These embodiments may compare the number of instructions executed with specific thresholds. Depending on whether the number of instructions is higher or lower than the thresholds, the computing devices may respond by activating and deactivating cores of the computing devices. By limiting execution of instructions of the computing device to a smaller number of cores and switching one or more cores to a lower power state, the devices may conserve power.02-11-2010
20100058036Distributed Acceleration Devices Management for Streams Processing - A method for managing distributed computer data stream acceleration devices is provided that utilizes distributed acceleration devices on nodes within the computing system to process inquiries by programs executing on the computing system. The available nodes and available acceleration devices in the computing system are identified. In addition, a plurality of virtual acceleration device definitions is created. Each virtual acceleration device definition includes attributes used to configure at least one of the plurality of identified acceleration devices. When an inquiry containing an identification of computing system resources to be used in processing the inquiry is received, at least one virtual acceleration device definition that is capable of configuring an acceleration device in accordance with the computing system resources identified by the inquiry is identified. That acceleration device is configured in accordance with the identified virtual acceleration device definition and is used to process the inquiry.03-04-2010
20100077185MANAGING THREAD AFFINITY ON MULTI-CORE PROCESSORS - Embodiments of the invention intelligently associate processes with core processors in a multi-core processor. The core processors are asymmetrical in that the core processors support different features or provide different resources. The features or resources are published by the core processors or otherwise identified (e.g., via a query). Responsive to a request to execute an instruction associated with a thread, one of the core processors is selected based on the resource or feature supporting execution of the instruction. The thread is assigned to the selected core processor such that the selected core processor executes the instruction and subsequent instructions from the assigned thread. In some embodiments, the resource or feature is emulated until an activity limit is reached upon which the thread assignment occurs.03-25-2010
20100077186PROCESSING APPARATUS, PROCESSING SYSTEM, AND COMPUTER READABLE MEDIUM - A processing apparatus includes: an execution unit; an execution request accept unit; a process instruction unit; an information leakage preventing process execution unit; a recording unit; and a transmission unit.03-25-2010
20100088492SYSTEMS AND METHODS FOR IMPLEMENTING BEST-EFFORT PARALLEL COMPUTING FRAMEWORKS - Implementations of the present principles include Best-effort computing systems and methods. In accordance with various exemplary aspects of the present principles, a application computation requests directed to a processing platform may be intercepted and classified as either guaranteed computations or best-effort computations. Best-effort computations may be dropped to improve processing performance while minimally affecting the end result of application computations. In addition, interdependencies between best-effort computations may be relaxed to improve parallelism and processing speed while maintaining accuracy of computation results.04-08-2010
20100095096AV DEVICE AND ITS CONTROL METHOD - In an AV device control, from unit instructions (04-15-2010
20100106946METHOD FOR PROCESSING STREAM DATA AND SYSTEM THEREOF - The present invention provides a method for processing stream data and a system thereof capable of implementing general data processing including recursive processing with low latency. In the system for processing stream data, a single operator graph is prepared from operator trees of a plurality of queries, an execution order of the operators is determined so that execution of a streaming operator is progressed in one way from an input to an output, the ignition time of an external ignition operator that inputs data from the outside of the system and an internal ignition operator that time-limitedly generates data is monitored, and an operator execution control unit repeats processing that completes the processing in the operator graph at the time according to the determined operator execution order, assuming the operator of the earliest ignition time as a start point.04-29-2010
20100115241KERNEL FUNCTION GENERATING METHOD AND DEVICE AND DATA CLASSIFICATION DEVICE - Kernel functions, the number of which is set in advance, are linearly coupled to generate the most suitable Kernel function for a data classification. An element Kernel generating unit 05-06-2010
20100115242ENGINE/PROCESSOR COOPERATION SYSTEM AND COOPERATION METHOD - To provide an engine software cooperation mechanism which avoids stopping the operation of a high-speed engine during timer monitoring processing. This system checks occurrence of a timeout event by directly accessing the content of a session data memory without regard to the locking state of a session. If detecting the state of timeout, the system requests execution of timeout processing via a timer transmission circuit. By timeout processing, the time of timeout and the present time are checked again to confirm whether a timer is not cancelled.05-06-2010
20100115243Apparatus, Method and Instruction for Initiation of Concurrent Instruction Streams in a Multithreading Microprocessor - A fork instruction for execution on a multithreaded microprocessor and occupying a single instruction issue slot is disclosed. The fork instruction, executing in a parent thread, includes a first operand specifying the initial instruction address of a new thread and a second operand. The microprocessor executes the fork instruction by allocating context for the new thread, copying the first operand to a program counter of the new thread context, copying the second operand to a register of the new thread context, and scheduling the new thread for execution. If no new thread context is free for allocation, the microprocessor raises an exception to the fork instruction. The fork instruction is efficient because it does not copy the parent thread general purpose registers to the new thread. The second operand is typically used as a pointer to a data structure in memory containing initial general purpose register set values for the new thread.05-06-2010
20100115244MULTITHREADING MICROPROCESSOR WITH OPTIMIZED THREAD SCHEDULER FOR INCREASING PIPELINE UTILIZATION EFFICIENCY - A multithreading processor for concurrently executing multiple threads is provided. The processor includes an execution pipeline and a thread scheduler that dispatches instructions of the threads to the execution pipeline. The execution pipeline execution pipeline is configured for generating a thread context (TC) flush indicator associated with a thread context when one or more instructions of the thread context would stall in the execution pipeline. One or more instructions in the pipeline of the thread context associated with the thread context flush signal can be flushed or nullified.05-06-2010
20100125721System and Method for Determining and/or Reducing Costs Associated with Utilizing Objects - According to one embodiment of the present invention, a method includes receiving, in near real time, data associated with a utilization of one or more objects. The method further includes comparing the data associated with the utilization of the one or more objects with one or more rules associated with the utilization of the one or more objects. The method further includes determining, in near real time, a cost associated with the utilization of the one or more objects based at least on the comparison. The method further includes providing, in near real time, an indication of the cost associated with the utilization of the one or more objects.05-20-2010
20100131742OUT-OF-ORDER EXECUTION MICROPROCESSOR THAT SELECTIVELY INITIATES INSTRUCTION RETIREMENT EARLY - A microprocessor for improving out-of-order superscalar execution unit utilization with a relatively small in-order instruction retirement buffer. A plurality of execution units each calculate an instruction result. The instruction is either an excepting type instruction or a non-excepting type instruction. The excepting type instruction is capable of causing the microprocessor to take an exception after being issued to the execution unit, wherein the non-excepting type instruction is incapable of causing the microprocessor to take an exception after being issued. A retire unit makes a determination that an instruction is the oldest instruction in the microprocessor and that the instruction is ready to update the architectural state of the microprocessor with its result. The retire unit makes the determination before the execution unit outputs the result of the non-excepting type instruction, wherein the retire unit makes the determination after the execution unit outputs the result of the excepting type instruction.05-27-2010
20100131743LAZY AND STATELESS EVENTS - Event-based processing is employed in conjunction with lazy and stateless events. Addition of any handlers is deferred until a user-specified handler is identified. Furthermore, event handlers can be composed at this time including the same properties as underlying events. More specifically, handlers specified on composite events can be composed and propagated up to a one or more related source events. As a result, handlers are not accumulated on composite events thereby making them stateless while allowing equivalent functionality upon invocation of the composed top-level handler.05-27-2010
20100138636Method of sending an executable code to a reception device and method of executing this code - One embodiment of the present invention discloses a process for sending an executable code to a security module locally connected to a receiving device. This security module comprises a microcontroller and a memory, the memory including at least one executable area provided to contain instructions suitable to be executed by the microcontroller, and at least one non-executable area, wherein the microcontroller cannot execute the instructions, further comprising the steps of dividing the executable code into blocks; adding at least one block management code to the blocks in order to create an extended block; introducing the content of an extended block into a message to be processed in the receiving device, in such a way that the whole executable code is contained in a plurality of messages; sending a message to the receiving device, this message containing one of the extended blocks different from the first extended block; processing the message in order to extract its extended block; storing the executable code and the at least one management code of the block received in the executable area of the memory; executing at least one management code of the extended block, this management code includes the effect of transferring the content of the block to a non-executable area of the memory; repeating the previous steps until all the extended blocks are stored in the memory, except for the first block; sending a message containing the first extended block to the receiving device; processing the message in order to extract the extended block and storing the executable code of the block received in the executable area of the memory. One embodiment of the invention also concerns a process for the execution of this code.06-03-2010
20100138637DATA PROCESSING METHOD - A data processing method for sampling data from data each varying over time, each of the data being associated with each of grid points arranged in an area, the method includes: dividing the area into blocks; calculating a variation rate of each of the data associated with each of the grid points included in each of the blocks; dividing the blocks into sub-blocks in accordance with the variation rate of the blocks; calculating a variation rate of each set of the data associated with each of the grid points included in each of the sub-blocks; and determining a frequency of sampling data associated with each of the grid points for the sub-blocks of the blocks and for the rest of the blocks in accordance with the variation rate of the sub-blocks and the rest of the blocks.06-03-2010
20100146245PARALLEL EXECUTION OF A LOOP - A method of executing a loop over an integer index range of indices in a parallel manner includes assigning a plurality of index subsets of the integer index range to a corresponding plurality of threads, and defining for each index subset a start point of the index subset, an end point of the index subset, and a boundary point of the index subset positioned between the start point and the end point of the index subset. A portion of the index subset between the start point and the boundary point represents a private range and the portion of the index subset between the boundary point and the end point represents a public range. Loop code is executed by each thread based on the index subset of the integer index range assigned to the thread.06-10-2010
20100146246Method and Apparatus for Decompression of Block Compressed Data - System and method for decompressing data. A compressed data stream including contiguous variable length blocks is received, each block including multiple contiguous variable length data fields and a tag portion that includes multiple contiguous tag fields corresponding respectively to the data fields. Each tag field stores a tag value specifying a size of a respective field in the block. A current variable length block is stored. A single machine instruction of a processor is executed that analyzes the tag portion of the current block, and creates a control pattern, storing the control pattern in a first register of the processor. The control pattern is configured to unpack the variable length data fields of the current variable length block into corresponding uniform data fields. The contiguous variable length data fields of the current variable length block are decompressed using the control pattern, thereby decompressing the compressed data stream.06-10-2010
20100153691LOWER POWER ASSEMBLER - A method for processing data using a time-stationary multiple-instruction word processing apparatus, arranged to execute a plurality of instructions in parallel, said method comprising the following steps: generating a set of multiple-instruction words (INS(i), INS(i+1), INS(i+2)), wherein each multiple-instruction word comprises a plurality of instruction fields, wherein each instruction field encodes control information for a corresponding resource of the processing apparatus, and wherein bit changes between an instruction field related to a no-operation instruction, and a corresponding instruction field of an adjacent multiple-instruction word are minimised; storing input data in a register file (RF06-17-2010
20100161946CONTROLLING JITTERING EFFECTS - Methods and related systems for controlling jitter effects are disclosed.06-24-2010
20100169618IDENTIFYING CONCURRENCY CONTROL FROM A SEQUENTIAL PROOF - The claimed subject matter provides a system and/or a method that facilitates ensuring non-interference between multiple threads that access a shared resource. An interface can receive a portion of sequential code, wherein the portion of sequential code includes a property that is maintained and relied upon when invoked and executed by a sequential client. A synthesizer component can leverage a sequential proof related to the portion of sequential code in order to derive a concurrency control mechanism for a portion of concurrency code that maintains the property when invoked by a concurrent client, wherein the sequential proof identifies a concurrent interference at an execution point that is tolerable for the concurrent client.07-01-2010
20100174890Known Good Code for On-Chip Device Management - In one embodiment, a processor comprises a programmable map and a circuit. The programmable map is configured to store data that identifies at least one instruction for which an architectural modification of an instruction set architecture implemented by the processor has been defined, wherein the processor does not implement the modification. The circuitry is configured to detect the instruction or its memory operands and cause a transition to Known Good Code (KGC), wherein the KGC is protected from unauthorized modification and is provided from an authenticated entity. The KGC comprises code that, when executed, emulates the modification. In another embodiment, an integrated circuit comprises at least one processor core; at least one other circuit; and a KGC source configured to supply KGC to the processor core for execution. The KGC comprises interface code for the other circuit whereby an application executing on the processor core interfaces to the other circuit through the KGC.07-08-2010
20100199074INSTRUCTION SET ARCHITECTURE WITH DECOMPOSING OPERANDS - Instead of having a processor with an instruction set architecture (ISA) that includes fixed architected operands, an improved processor supports additional characteristic bits for computing instructions (e.g., a multiply-add, load/store instructions). Such additional bits for the certain instructions influence the processing of these instructions by the processor. Also, a new instruction is introduced for further usage of the proposed method. Typically these additional characteristic bits as well as the instruction can be automatically generated by compilers to provide relatively well-suited instruction sequences for the processor.08-05-2010
20100199075MULTITHREADED PROCESSOR WITH MULTIPLE CONCURRENT PIPELINES PER THREAD - A multithreaded processor comprises a plurality of hardware thread units, an instruction decoder coupled to the thread units for decoding instructions received therefrom, and a plurality of execution units for executing the decoded instructions. The multithreaded processor is configured for controlling an instruction issuance sequence for threads associated with respective ones of the hardware thread units. On a given processor clock cycle, only a designated one of the threads is permitted to issue one or more instructions, but the designated thread that is permitted to issue instructions varies over a plurality of clock cycles in accordance with the instruction issuance sequence. The instructions are pipelined in a manner which permits at least a given one of the threads to support multiple concurrent instruction pipelines.08-05-2010
20100205410Data Processing - Apparatus for data processing includes a processor, memory and storage. A plurality of sets of instructions, each corresponding to one of a plurality of programs, is stored in the storage. The processor is configured to load the sets of instructions from the storage into the memory, identify a first program as nonessential, close the first program and remove its corresponding set of instructions from the memory, and reload the set of instructions corresponding to the first program into the memory from the storage.08-12-2010
20100211763Data broadcast processing device, method and program - The present invention relates to a data broadcast processing device, method, and program which enable secure control of an operation of a data broadcast processing device. Since a flag standalone is not set in a script NCL Script 08-19-2010
20100223446CONTEXTUAL TRACING - A method of tracking execution of activities in a computing environment in which events in an activity are recorded along with an activity identifier uniquely identifying the activity and tying the events to the activity. To track interactions between activities, a correlation identifier may be generated and transferred between the interacting activities as part of the interaction. For each of the activities participating in the interaction, information on an event relating to the interaction is recorded along with the correlation identifier. The correlation identifier thus allows uniquely identifying each interaction which may be used to synchronize streams of events within the activities at points of their interaction. Activities may interact across any boundary, including a network.09-02-2010
20100235611COMPILER, COMPILE METHOD, AND PROCESSOR CORE CONTROL METHOD AND PROCESSOR - A compiler compiling a source code and is implemented in a plurality of processor cores includes a parallel loop processing detection unit configured to detect from the source code a loop processing code for execution of an internal processing operation for a given number of repeating times, and an independent parallel loop processing code in the internal processing operation performed for each repetition to be concurrently processed, and a dynamic parallel conversion unit configured to generate a control core code for control of the number of repeating times in the parallel loop processing code and a parallel processing code for changing the number of repeating times corresponding to the control from the control core code.09-16-2010
20100241833INFORMATION PROCESSING APPARATUS - Disclosed is an information processing apparatus in which various kinds of information are processed in either the real time processing mode or the non-real time processing mode. The apparatus includes an operation display section to accept an inputted instruction, an image processing section to apply a processing to image information and a processor provided with a plurality of same cores. The real-time processing unnecessary process that is related to the operation display section, is fixed onto one of the plurality of same cores so that the one of the plurality of same cores is in charge of controlling the real-time processing unnecessary process, while, the real-time processing necessary process that is related to the image processing section, is fixed onto another one of the plurality of same cores so that the other one of the plurality of same cores is in charge of controlling the real-time processing necessary process.09-23-2010
20100250903APPARATUSES AND SYSTEMS INCLUDING A SOFTWARE APPLICATION ADAPTATION LAYER AND METHODS OF OPERATING A DATA PROCESSING APPARATUS WITH A SOFTWARE ADAPTATION LAYER - A software application adaptation layer is comprised of a program file comprising a plurality of adaptation filters and a configuration file. The configuration file may designate one or more adaptation filters of the plurality of adaptation filters to be applied by the program file for modifying one or more behaviors of an active software application. A data processing apparatus including such a software application adaptation layer includes processing circuitry configured to execute instructions for the active software application, a communications module coupled to the processing circuitry and at least one storage medium for storing the program file and the configuration file. Operational methods for such an apparatus includes storing the program file and the configuration file in the at least one storage medium, identifying the active software application, generating an adaptation filter set and attaching the adaptation filter set to an input queue associated with the active software application.09-30-2010
20100262810CONCURRENT INSTRUCTION OPERATION METHOD AND DEVICE - A concurrent instruction operation method and device are provided. The method includes: establishing a concurrent queue, and setting a queue base address and a queue maximum length of the concurrent queue; generating concurrent operation instructions according to a length of data that needs to be written or read as well as the queue base address and queue maximum length of the concurrent queue; and executing the concurrent operation instructions in the concurrent queue, and completing a data operation to the concurrent queue.10-14-2010
20100281238Execution of instructions directly from input source - A computer array (11-04-2010
20100318770ELECTRONIC DEVICE, COMPUTER-IMPLEMENTED SYSTEM, AND APPLICATION DISPLAY CONTROL METHOD THEREFOR - An electronic device, a computer-implemented system, and an application display control method thereof are disclosed. The electronic device has a process unit that executes an operating system kernel, and then executes a first and a second software platform via the operating system kernel. When a first application is executed on the first software platform, a first window manager of the first software platform controls a screen area within which the first application being executed is displayed. The second software platform is notified by the first application to execute a second application, and a second window manager of the second software platform displays in the screen area a screen image that is generated when the second application is executed. Therefore, a user is given the flexibility of executing applications for different software platforms on the same one electronic device.12-16-2010
20100332807PERFORMING ESCAPE ACTIONS IN TRANSACTIONS - Performing non-transactional escape actions within a hardware based transactional memory system. A method includes at a hardware thread on a processor beginning a hardware based transaction for the thread. Without committing or aborting the transaction, the method further includes suspending the hardware based transaction and performing one or more operations for the thread, non-transactionally and not affected by: transaction monitoring and buffering for the transaction, an abort for the transaction, or a commit for the transaction. After performing one or more operations for the thread, non-transactionally, the method further includes resuming the transaction and performing additional operations transactionally. After performing the additional operations, the method further includes either committing or aborting the transaction.12-30-2010
20110010529INSTRUCTION EXECUTION CONTROL METHOD, INSTRUCTION FORMAT, AND PROCESSOR - With conventional ordered data reference instructions, an instruction which is to be the subject of an execution order guarantee cannot be separately specified, and a resource which is to be the subject of an execution order guarantee likewise cannot be specified and thus instruction movement is restricted more than necessary in the out-of-order execution of instructions and so on and performance deterioration becomes significant particularly in the case of performing data transfer to a resource having high access latency. Consequently, the field of an ordered data reference instruction judged to include a predetermined field is decoded so as to identify a subject instruction which is specified by the ordered data reference instruction and is the subject of execution order guarantee, and guarantee the execution order of the subject instruction with respect to the execution of the identified ordered data reference instruction.01-13-2011
20110016293DEVICE AND METHOD FOR THE DISTRIBUTED EXECUTION OF DIGITAL DATA PROCESSING OPERATIONS - This device (01-20-2011
20110047357Methods and Apparatus to Predict Non-Execution of Conditional Non-branching Instructions - Efficient techniques are described for not executing an issued conditional non-branch instruction. A conditional non-branch instruction is identified as being eligible for a prediction, the prediction indicating that the eligible conditional non-branch (ECNB) instruction would not execute. The ECNB instruction executes as a no operation (NOP) instruction in response to the prediction that the ECNB instruction would not execute. A source operand required for the ECNB instruction to execute is not fetched in response to the prediction to not execute.02-24-2011
20110060891PARALLEL PIPELINED VECTOR REDUCTION IN A DATA PROCESSING SYSTEM - A parallel processing data processing system builds at least one data structure indicating a communication schedule for a plurality of processes each having a respective one of a plurality of equal length vectors formed of multiple equal size chunks. The data processing system, based upon the at least one data structure, communicates chunks of the plurality of vectors among the plurality of processes and performs partial reduction operations on chunks in accordance with the communication schedule. The data processing system then stores a result vector representing reduction of the plurality of vectors.03-10-2011
20110066828MAPPING OF COMPUTER THREADS ONTO HETEROGENEOUS RESOURCES - Techniques are generally described for mapping a thread onto heterogeneous processor cores. Example techniques may include associating the thread with one or more predefined execution characteristic(s), assigning the thread to one or more heterogeneous processor core(s) based on the one or more predefined execution characteristic(s), and/or executing the thread by the respective assigned heterogeneous processor core(s).03-17-2011
20110072245HARDWARE FOR PARALLEL COMMAND LIST GENERATION - A method for providing state inheritance across command lists in a multi-threaded processing environment. The method includes receiving an application program that includes a plurality of parallel threads; generating a command list for each thread of the plurality of parallel threads; causing a first command list associated with a first thread of the plurality of parallel threads to be executed by a processing unit; and causing a second command list associated with a second thread of the plurality of parallel threads to be executed by the processing unit, where the second command list inherits from the first command list state associated with the processing unit.03-24-2011
20110072246NODE CONTROL DEVICE INTERPOSED BETWEEN PROCESSOR NODE AND IO NODE IN INFORMATION PROCESSING SYSTEM - A node control device is interposed between processor nodes and IO nodes in an information processing system, wherein each IO node subordinates at least one IO device. The node control device includes a register storing a base address of a mapping destination of an IO space, a table describing a plurality of entries retaining a plurality of IO space numbers and address ranges, and an IO space access detection circuit. The table stores an identification flag as to whether or not IO spaces are each mapped onto a memory space. The IO space access detection circuit decodes a command code and an address of an FRTT signal output from a processor node, thus detecting a target IO space and detecting whether the processor node is accessing an IO space mapped onto the memory space or another IO space.03-24-2011
20110078418Support for Non-Local Returns in Parallel Thread SIMD Engine - One embodiment of the present invention sets forth a method for executing a non-local return instruction in a parallel thread processor. The method comprises the steps of receiving, within the thread group, a first long jump instruction and, in response, popping a first token from the execution stack. The method also comprises determining whether the first token is a first long jump token that was pushed onto the execution stack when a first push instruction associated with the first long jump instruction was executed, and when the first token is the first long jump token, jumping to the second instruction based on the address specified by the first long jump token, or, when the first token is not the first long jump token, disabling the active thread until the first long jump token is popped from the execution stack.03-31-2011
20110078419SET PROGRAM PARAMETER INSTRUCTION - A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss.03-31-2011
20110087863DATA PROCESSING APPARATUS HAVING A PARALLEL PROCESSING CIRCUIT INCLUDING A PLURALITY OF PROCESSING MODULES, AND METHOD FOR CONTROLLING THE SAME - In an apparatus which includes a plurality of processing modules connected via a ring-shape bus, if a plurality pieces of pipeline processing to be processed in a different order is allocated to a plurality of processing modules, the transfer efficiency may decrease when an amount of data transferred from one of the processing modules to a post-stage module exceeds a processing capacity of the post-stage module. Accordingly, a module positioned on the preceding side in the pipeline processing controls a transmission interval of processed data so that the post-stage module can receive the data processed by the preceding module.04-14-2011
20110087864PROVIDING PIPELINE STATE THROUGH CONSTANT BUFFERS - One embodiment of the present invention sets forth a technique for providing state information to one or more shader engines within a processing pipeline. State information received from an application accessing the processing pipeline is stored in constant buffer memory accessible to each of the shader engines. The shader engines can then retrieve the state information during execution.04-14-2011
20110093685DATA PROCESSING CIRCUIT - A data processing circuit is disclosed in the present invention. The data processing circuit includes a decoder and a number of N-stage circuits. The circuits receive input data from at least a memory and separate the input data into N stages. The circuit process and store the N input data simultaneously to decrease the time of data processing in the data processing circuit.04-21-2011
20110107065INTERCONNECT CONTROLLER FOR A DATA PROCESSING DEVICE AND METHOD THEREFOR - A data processing device includes an interconnect controller operable to manage the communication of information between modules of the data processing device via an interconnect. In response to a transaction request the interconnect controller selects a tag value from a set of available tag values, assigns the tag to the transaction and reserves the tag value so that it is unavailable for assignment to other transactions. If an expected response to the transaction request is not received within a designated amount of time, the transaction enters a timed-out state and the interconnect controller locks the tag value, so that it remains unavailable for assignment to other transactions until an unlock event, such as a request from software.05-05-2011
20110107066CASCADED ACCELERATOR FUNCTIONS - Accelerator functions are cascaded, such that a result of one accelerator function is directly forwarded to another accelerator function, bypassing the processor requesting the functions to be performed. The cascading may be provided during compilation of a program specifying the functions to be performed, but can be dynamically reversed during runtime of the program.05-05-2011
20110107067SINGLE-CHIP MULTIPROCESSOR WITH CLOCK CYCLE-PRECISE PROGRAM SCHEDULING OF PARALLEL EXECUTION - A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.05-05-2011
20110119470GENERATION-BASED MEMORY SYNCHRONIZATION IN A MULTIPROCESSOR SYSTEM WITH WEAKLY CONSISTENT MEMORY ACCESSES - In a multiprocessor system, a central memory synchronization module coordinates memory synchronization requests responsive to memory access requests in flight, a generation counter, and a reclaim pointer. The central module communicates via point-to-point communication. The module includes a global OR reduce tree for each memory access requesting device, for detecting memory access requests in flight. An interface unit is implemented associated with each processor requesting synchronization. The interface unit includes multiple generation completion detectors. The generation count and reclaim pointer do not pass one another.05-19-2011
20110138154Optimization of a Computing Environment in which Data Management Operations are Performed - Described are embodiments of an invention for optimizing a computing environment that performs data management operations such as encryption, deduplication and compression. The computing environment includes data components and a management system. The data components operate on data during the lifecycle of the data. The management system identifies all the data components in a data path, how the data components are interconnected, the data management operations performed at each data component, and how many data management operations of each type are performed at each data component. Further, the management system builds a data structure to represent the flow of data through the data path and analyzes the data structure in view of policy. After the analysis, the management system provides recommendations to optimize the computing environment through the reconfiguration of the data management operation configuration and reconfigures the data management operation configuration to optimize the computing environment.06-09-2011
20110153992METHODS AND APPARATUS TO MANAGE OBJECT LOCKS - Example methods and apparatus to manage object locks are disclosed. A disclosed example method includes receiving an object lock request from a processor, the lock request associated with object lock code to lock an object, and generating object lock-bypass code based on a type of the processor, the object lock-bypass code to execute in a managed runtime in response to receiving the object lock request. The example method also includes identifying a type of instruction set architecture (ISA) associated with the processor, invoking a checkpoint instruction for the processor based on the identified ISA, suspending the object lock code from executing and executing target code when the object is uncontended, and allowing the object lock code to execute when the object is contended.06-23-2011
20110161635Rotate instructions that complete execution without reading carry flag - A method of one aspect may include receiving a rotate instruction. The rotate instruction may indicate a source operand and a rotate amount. A result may be stored in a destination operand indicated by the rotate instruction. The result may have the source operand rotated by the rotate amount. Execution of the rotate instruction may complete without reading a carry flag.06-30-2011
20110161636METHOD OF MANAGING POWER OF MULTI-CORE PROCESSOR, RECORDING MEDIUM STORING PROGRAM FOR PERFORMING THE SAME, AND MULTI-CORE PROCESSOR SYSTEM - Provided are a method of managing power of a multi-core processor, a recording medium storing a program for performing the method, and a multi-core processor system. The method of managing power of a multi-core processor having at least one core includes determining a parallel-processing section on the basis of information included in a parallel-processing program, collecting information for determining a clock frequency of the core in the determined parallel-processing section according to each core, and then determining the clock frequency of the core on the basis of the collected information. Accordingly, it is possible to minimize power consumption while ensuring quality of service (QoS).06-30-2011
20110161637APPARATUS AND METHOD FOR PARALLEL PROCESSING - An apparatus and method for parallel processing in consideration of degree of parallelism are provided. One of a task parallelism and a data parallelism is dynamically selected while a job is processed. In response to a task parallelism being selected, a sequential version code is allocated to a core or processor for processing a job. In response to a data parallelism being selected, a parallel version code is allocated to a core a processor for processing a job.06-30-2011
20110167245Task list generation, parallelism templates, and memory management for multi-core systems - There is provided a multi-core system that provides automated task list generation, parallelism templates, and memory management. By constructing, profiling, and analyzing a sequential list of functions to be executed in a parallel fashion, corresponding parallel execution templates may be stored for future lookup in a database. A processor may then select a subset of functions from the sequential list of functions based on input data, select a template from the template database based on particular matching criteria such as high-level task parameters, finalize the template by resolving pointers and adding or removing transaction control blocks, and forward the resulting optimized task list to a scheduler for distribution to multiple slave processing cores. The processor may also analyze data dependencies between tasks to consolidate tasks working on the same data to a single core, thereby implementing memory management and efficient memory locality.07-07-2011
20110173419Look-Ahead Wake-and-Go Engine With Speculative Execution - A wake-and-go mechanism is provided for a microprocessor. The wake-and-go mechanism looks ahead in the instruction stream of a thread for programming idioms that indicates that the thread is waiting for an event. If a look-ahead polling operation succeeds, the look-ahead wake-and-go engine may record an instruction address for the corresponding idiom so that the wake-and-go mechanism may have the thread perform speculative execution at a time when the thread is waiting for an event. During execution, when the wake-and-go mechanism recognizes a programming idiom, the wake-and-go mechanism may store the thread state in the thread state storage. Instead of putting thread to sleep, the wake-and-go mechanism may perform speculative execution.07-14-2011
20110173420PROCESSOR RESUME UNIT - A system for enhancing performance of a computer includes a computer system having a data storage device. The computer system includes a program stored in the data storage device and steps of the program are executed by a processor. An external unit is external to the processor for monitoring specified computer resources. The external unit is configured to detect a specified condition using the processor. The processor including one or more threads. The thread resumes an active state from a pause state using the external unit when the specified condition is detected by the external unit.07-14-2011
20110179257IMAGE FORMING DEVICE, IMAGE FORMING METHOD AND COMPUTER READABLE MEDIUM - A data processing device including a reception unit, an instruction unit and a storage unit. The reception unit receives instructions for processing at a processing execution device. The instruction unit instructs the processing execution device to cancel a power saving state of the processing execution device and execute the processing corresponding to an instruction received by the reception unit. The storage unit stores data relating to received instructions. If the processing corresponding to the received instruction is a pre-specified process, data relating to the instruction is stored by the storage unit. If the processing corresponding to the received instruction is not a pre-specified process, the instruction unit instructs the processing execution device to execute both the processing corresponding to this instruction and processing based on data relating to instructions stored in the storage unit.07-21-2011
20110225399PROCESSOR AND METHOD FOR SUPPORTING MULTIPLE INPUT MULTIPLE OUTPUT OPERATION - A processor for supporting a MIMO operation and method of processing a MIMO instruction are provided. The MIMO operation supporting processor may include a scheduler and at least one functional unit. The scheduler may map multiple inputs of the MIMO instruction to a plurality of sequential input cycles, respectively, and may map multiple outputs of the MIMO instruction to a plurality of sequential output cycles, respectively. The output cycles may be followed by the input cycles and a predetermined number of cycles for a MIMO operation. A functional unit may read a register during sequential input cycles, may perform a MIMO operation during a predetermined number of execution cycles, and may write the result of the MIMO operation into a register during sequential output cycles.09-15-2011
20110231635Register, Processor, and Method of Controlling a Processor - A processor and a processor control method which efficiently perform an operation on data using a register, are provided. The register may include a data type field and a data field. The processor may generate the data type bits and store the generated data type bits in the data type field.09-22-2011
20110238954DATA PROCESSING APPARATUS - Source code to be processed is analyzed and configuration data in implementing in accordance with each of plural implementation systems is created and is stored in a local memory of a DRP incorporating system. When execution of target processing is started, the implementation system determination processing calculates estimated processing time when the configuration of each of the implementation systems is adopted and determines the optimum one of the implementation systems based on a combination of the estimated processing time and the circuit scale of the configuration.09-29-2011
20110238955METHODS FOR SCALABLY EXPLOITING PARALLELISM IN A PARALLEL PROCESSING SYSTEM - Parallelism in a parallel processing subsystem is exploited in a scalable manner. A problem to be solved can be hierarchically decomposed into at least two levels of sub-problems. Individual threads of program execution are defined to solve the lowest-level sub-problems. The threads are grouped into one or more thread arrays, each of which solves a higher-level sub-problem. The thread arrays are executable by processing cores, each of which can execute at least one thread array at a time. Thread arrays can be grouped into grids of independent thread arrays, which solve still higher-level sub-problems or an entire problem. Thread arrays within a grid, or entire grids, can be distributed across all of the available processing cores as available in a particular system implementation.09-29-2011
20110246750PROCESSING CAPACITY ON DEMAND - Embodiments of the present invention relate to a system and method for providing processing capacity on demand. According to the embodiments, a processor package has a plurality of processing elements. One or more of the processing elements may be made active in response to increased demand for processing capacity based on modifiable authorization information.10-06-2011
20110246751INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS - Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.10-06-2011
20110252222EVENT COUNTER IN A SYSTEM ADAPTED TO THE JAVACARD LANGUAGE - The implementation of a counter in a microcontroller adapted to the JavaCard language while respecting the atomicity of a modification of the value of this counter, wherein the counter is reset by the sending to the microcontroller of an instruction to verify a user code by submitting a correct code, and the value of the counter is decremented by the sending to the microcontroller of the instruction to verify the user code with an erroneous code value.10-13-2011
20110258416STATICALLY SPECULATIVE COMPILATION AND EXECUTION - A system, for use with a compiler architecture framework, includes performing a statically speculative compilation process to extract and use speculative static information, encoding the speculative static information in an instruction set architecture of a processor, and executing a compiled computer program using the speculative static information, wherein executing supports static speculation driven mechanisms and controls.10-20-2011
20110258417POWER AND THROUGHPUT OPTIMIZATION OF AN UNBALANCED INSTRUCTION PIPELINE - A method includes determining a rate of resource occupancy of a constituent stage of an unbalanced instruction pipeline implemented in a processor through profiling an instruction code. The method also includes performing data processing at a maximum throughput at an optimum clock frequency based on the rate of resource occupancy.10-20-2011
20110271084Information processing system and information processing method - A disclosed information processing system includes a receiving node and a storing node, the receiving node includes an order information adding unit that adds first order information to operation instructions included in an operation instruction sequence, the first order information indicating an order among the operation instruction sequences and an operation instruction transmission unit that transmits the one or more operation instructions to the storing node, the storing node includes an operation instruction execution unit that executes the operation instructions. Further, upon a receipt of a second operation instruction having the first order information indicating that the second operation instruction is earlier than the one or more first operation instructions, which was already executed, in the first order relationship, the storing node re-executes the first operation instruction after the second operation instruction is executed.11-03-2011
20110276789PARALLEL PROCESSING OF DATA - A data parallel pipeline may specify multiple parallel data objects that contain multiple elements and multiple parallel operations that operate on the parallel data objects. Based on the data parallel pipeline, a dataflow graph of deferred parallel data objects and deferred parallel operations corresponding to the data parallel pipeline may be generated and one or more graph transformations may be applied to the dataflow graph to generate a revised dataflow graph that includes one or more of the deferred parallel data objects and deferred, combined parallel data operations. The deferred, combined parallel operations may be executed to produce materialized parallel data objects corresponding to the deferred parallel data objects.11-10-2011
20110283091PARALLELIZING SEQUENTIAL FRAMEWORKS USING TRANSACTIONS - Various technologies and techniques are disclosed for transforming a sequential loop into a parallel loop for use with a transactional memory system. Open ended and/or closed ended sequential loops can be transformed to parallel loops. For example, a section of code containing an original sequential loop is analyzed to determine a fixed number of iterations for the original sequential loop. The original sequential loop is transformed into a parallel loop that can generate transactions in an amount up to the fixed number of iterations. As another example, an open ended sequential loop can be transformed into a parallel loop that generates a separate transaction containing a respective work item for each iteration of a speculation pipeline. The parallel loop is then executed using the transactional memory system, with at least some of the separate transactions being executed on different threads.11-17-2011
20110320778CENTRALIZED SERIALIZATION OF REQUESTS IN A MULTIPROCESSOR SYSTEM - Serializing instructions in a multiprocessor system includes receiving a plurality of processor requests at a central point in the multiprocessor system. Each of the plurality of processor requests includes a needs register having a requestor needs switch and a resource needs switch. The method also includes establishing a tail switch indicating the presence of the plurality of processor requests at the central point, establishing a sequential order of the plurality of processor requests, and processing the plurality of processor requests at the central point in the sequential order.12-29-2011
20110320779PERFORMANCE MONITORING IN A SHARED PIPELINE - A pipelined processing device includes: a device controller configured to receive a request to perform an operation; a plurality of subcontrollers configured to receive at least one instruction associated with the operation, each of the plurality of subcontrollers including a counter configured to generate an active time value indicating at least a portion of a time taken to process the at least one instruction; a pipeline processor configured to receive and process the at least one instruction, the pipeline processor configured to receive the active time value; and a shared pipeline storage area configured to store the active time value for each of the plurality of subcontrollers.12-29-2011
20110320780HYBRID COMPARE AND SWAP/PERFORM LOCKED OPERATION QUEUE ALGORITHM - Systems, methods, and computer program products are disclosed for intermixing different types of machine instructions. One embodiment of the invention provides a protocol for intermixing the different types of machine instructions. By adhering to the protocol, different types of machine instructions may be intermixed to concurrently update data structures without leading to unpredictable results.12-29-2011
20120011347PARALLEL PROGRAMMING INTERFACE TO DYNAMICALY ALLOCATE PROGRAM PORTIONS - A computing device-implemented method includes receiving a program created by a technical computing environment, analyzing the program, generating multiple program portions based on the analysis of the program, dynamically allocating the multiple program portions to multiple software units of execution for parallel programming, receiving multiple results associated with the multiple program portions from the multiple software units of execution, and providing the multiple results or a single result to the program.01-12-2012
20120030450METHOD AND SYSTEM FOR PARALLEL COMPUTATION OF LINEAR SEQUENTIAL CIRCUITS - A method and system for parallel computation of a linear sequential circuit (LSC) based on a state transition matrix is disclosed herein. A multistep state transition matrix and a multistep output generation matrix can be pre-computed and stored in association with the linear sequential circuit. The multiple state transitions and the multiple output bits can be computed by multiplying the current input-state vector with a multistep next state transition matrix and a multistep output generation matrix, respectively. Multiple state transitions and multiple output bits can be generated in parallel in a single clock cycle based on the pre-computed state transition matrix and the output generation matrix utilizing a dot product in order to improve computational speed. Such a simple augmentation provides a flexible and inexpensive solution for high speedup linear sequential circuit computation with respect to a processor.02-02-2012
20120036339ASYNCHRONOUS ASSIST THREAD INITIATION - A method of data processing includes a processor of a data processing system executing a controlling thread of a program and detecting occurrence of a particular asynchronous event during execution of the controlling thread of the program. In response to occurrence of the particular asynchronous event during execution of the controlling thread of the program, the processor initiates execution of an assist thread of the program such that the processor simultaneously executes the assist thread and controlling thread of the program.02-09-2012
20120047353System and Method Providing Run-Time Parallelization of Computer Software Accommodating Data Dependencies - A system and method of parallelizing programs employs runtime instructions to identify data accessed by program portions and to assign those program portions to particular processors based on potential overlap between the access data. Data dependence between different program portions may be identified and used to look for pending “predicate” program portions that could create data dependencies and to postpone program portions that may be dependent while permitting parallel execution of other program portions.02-23-2012
20120060018Collective Operations in a File System Based Execution Model - A mechanism is provided for group communications using a MULTI-PIPE synthetic file system. A master application creates a multi-pipe synthetic file in the MULTI-PIPE synthetic file system, the master application indicating a multi-pipe operation to be performed. The master application then writes a header-control block of the multi-pipe synthetic file specifying at least one of a multi-pipe synthetic file system name, a message type, a message size, a specific destination, or a specification of the multi-pipe operation. Any other application participating in the group communications then opens the same multi-pipe synthetic file. A MULTI-PIPE file system module then implements the multi-pipe operation as identified by the master application. The master application and the other applications then either read or write operation messages to the multi-pipe synthetic file and the MULTI-PIPE synthetic file system module performs appropriate actions.03-08-2012
20120110308METHOD FOR CONTROLLING BMC HAVING CUSTOMIZED SDR - A Baseboard Management Controller (BMC) controlling method includes the steps of dividing a memory of a BMC into an original region and customized region, in which the original region includes at least one original sensor data record (SDR) and original platform event filter (PEF) corresponding to each other; providing an instruction set to at least one external system, in which the external system manages at least one customized SDR and customized PEF corresponding to each other in the customized region through the instruction set; polling the original SDR in the original region and the customized SDR in the customized region; determining whether values of the SDRs obtained through polling conform to a plurality of critical values individually corresponding to the SDRs; and obtaining a processing policy according to the corresponding PEF when at least one value of the SDR does not conform to the corresponding critical value.05-03-2012
20120124341Methods and Apparatus for Performing Multiple Operand Logical Operations in a Single Instruction - A method for performing multiple-operand logical operations in a single instruction includes the steps of: generating a table defining a correspondence between a plurality of input variables to a multiple-operand logical operation and a plurality of output results of the multiple-operand logical operation; encoding the table to generate a set of values for use by the single instruction, each value being indicative of an output result of the multiple-operand logical operation as a function of a corresponding unique combination of values of the input variables; and at least one processor performing the multiple-operand logical operation in a single instruction as a function of the set of values for a prescribed combination of values of the input variables.05-17-2012
20120131314Ganged Hardware Counters for Coordinated Rollover and Reset Operations - Mechanisms for controlling rollover or reset of hardware performance counters in the data processing system. A signal indicating that a rollover or reset of a first hardware performance counter has occurred is received and it is determined if the first hardware performance counter is analytically related to one or more second hardware performance counters based on defined ganged hardware performance counter sets. A signal is sent to each of the one or more second hardware performance counters in response to a determination that the first hardware performance counter is analytically related to the one or more second hardware performance counters. Each of the one or more second hardware performance counters is reset to an initial value in response to the one or more second hardware performance counters receiving the signal from the ganged hardware performance counter rollover/reset logic.05-24-2012
20120131315DATA PROCESSING APPARATUS - A data processing apparatus may include a processing unit that performs processing related to data, a first register that holds a value for defining an operation of the processing unit, a second register that holds a value output from the first register, the second register outputting the value to the processing unit, a first control unit that performs control for writing a value in the first register, a second control unit that performs control for rewriting the value held by the second register with the value output from the first register, after the value is written in the first register, and a third control unit that performs control for rewriting the value held by the second register with an invalid value, at which the processing of the processing unit is stopped, during a period for which the value is written in the first register.05-24-2012
20120137110HARDWARE DEVICE FOR PROCESSING THE TASKS OF AN ALGORITHM IN PARALLEL - A hardware device for concurrently processing a fixed set of predetermined tasks associated with an algorithm which includes a number of processes, some of the processes being dependent on binary decisions, includes a plurality of task units for processing data, making decisions and/or processing data and making decisions, including source task units and destination task units. A task interconnection logic means interconnect the task units for communicating actions from a source task unit to a destination task unit. Each of the task units includes a processor for executing only a particular single task of the fixed set of predetermined tasks associated with the algorithm in response to a received request action, and a status manager for handling the actions from the source task units and building the actions to be sent to the destination task units.05-31-2012
20120144168ODD AND EVEN START BIT VECTORS - A method and apparatus is presented for identifying instructions in a stream of information by preprocessing the stream of information, creating a vector of instructions and breaking the vector of instructions into two or more vectors for picking the identified instructions at a high frequency.06-07-2012
20120144169INFORMATION PROCESSING APPARATUS, INFORMATION PROCESSING METHOD, AND COMPUTER READABLE MEDIUM - An information processing apparatus includes the following elements. A generator generates, on the basis of instruction information which describes processing to be executed for obtaining output data from raw data, processing definition information that defines details of the processing, upon inputting the instruction information. A determination unit determines whether output data associated with the currently generated processing definition information and data to be used as raw data is stored in a first memory, the first memory storing therein output data which has been obtained in accordance with previously generated processing definition information in association with data used as the raw data and the processing definition information. An output unit outputs, if the determination unit determines that the output data is stored in the first memory, the output data stored in the first memory without causing a processor to execute the processing.06-07-2012
20120151187INSTRUCTION OPTIMIZATION - Programs can be optimized at runtime prior to execution to enhance performance. Program instructions/operations designated for execution can be recorded and subsequently optimized at runtime prior to execution, for instance by performing transformations on the instructions. For example, such optimization can remove, reorder, and/or combine instructions, among other things.06-14-2012
20120151188TYPE AND LENGTH ABSTRACTION FOR DATA TYPES - Embodiments are directed to implementing a generic SIMD data type in software code. In an embodiment, a computer system accesses a portion of software code that includes an algorithm with a generic SIMD data type that includes a variable number of elements. The algorithm with the generic SIMD data type is to be processed by a specific processor that includes various specific hardware features. The computer system determines at runtime a portion of customized processor-specific code that is to be used with the specified processor based on the generic SIMD data type, wherein the runtime determination resolves the number of elements that are to be used with the specified processor. The computer system also processes the software code including the algorithm with the generic SIMD data type using the determined, customized processor-specific code.06-14-2012
20120151189DATA PROCESSING WITH VARIABLE OPERAND SIZE - A method of processing data comprising performing a sequence of operation instructions with variable operand size, wherein respective size codes for different source and destination operands are obtained and registered separately from performing the sequence of operation instructions, and the sequence of operation instructions is performed using operand sizes defined by the registered size codes, the operation instructions of the sequence not themselves containing size codes.06-14-2012
20120151190DATA PROCESSING APPARATUS, DATA PROCESSING METHOD, AND NON-TRANSITORY COMPUTER READABLE STORAGE MEDIUM - A data processing apparatus includes an output unit. The output unit determines, when parallel control is performed in a data processor created in the data processing apparatus so that plural processing modules forming the data processor perform data processing in parallel, on the basis of a value representing a parallel-processing time for which at least two processing modules are operated in parallel and a value representing a control time, which is not necessary when serial control is performed so that the processing modules serially perform data processing but which is necessary when the parallel control is performed so that the processing modules perform data processing in parallel, whether a time necessary to complete data processing performed by the data processor under the parallel control would be shorter than a time necessary to complete data processing performed by the data processor under the serial control, and outputs a determination result.06-14-2012
20120166771AGILE COMMUNICATION OPERATOR - A high level programming language provides an agile communication operator that generates a segmented computational space based on a resource map for distributing the computational space across compute nodes. The agile communication operator decomposes the computational space into segments, causes the segments to be assigned to compute nodes, and allows the user to centrally manage and automate movement of the segments between the compute nodes. The segment movement may be managed using either a full global-view representation or a local-global-view representation of the segments.06-28-2012
20120166772EXTENSIBLE DATA PARALLEL SEMANTICS - A high level programming language provides extensible data parallel semantics. User code specifies hardware and software resources for executing data parallel code using a compute device object and a resource view object. The user code uses the objects and semantic metadata to allow execution by new and/or updated types of compute nodes and new and/or updated types of runtime libraries. The extensible data parallel semantics allow the user code to be executed by the new and/or updated types of compute nodes and runtime libraries.06-28-2012
20120173853Processing apparatus and method for performing computation - A processing apparatus includes an execution unit which performs computation on two operand inputs each being selectable between read data from a register and an immediate value. The processing apparatus also includes another execution unit which performs computation on two operand inputs, one of which is selectable between read data from a register and an immediate value, and the other of which is an immediate value. A control unit determines, based on a received instruction specifying a computation on two operands, whether each of the two operands specifies read data from a register or an immediate value. Depending on the determination result, the control unit causes one of the execution units to execute the computation specified by the received instruction.07-05-2012
20120185677METHODS AND SYSTEMS FOR STORAGE OF BINARY INFORMATION THAT IS USABLE IN A MIXED COMPUTING ENVIRONMENT - A method of managing binary data across a mixed computing environment is provided. The method includes performing on one or more processors: receiving binary data; receiving binary coded data indicating a type of the binary data; formatting the binary data and the binary coded data according to a first format; and generating at least one of a message and a file based on the formatted data.07-19-2012
20120185678HARDWARE THREAD DISABLE WITH STATUS INDICATING SAFE SHARED RESOURCE CONDITION - A technique for indicating a safe shared resource condition with respect to a disabled thread provides a mechanism for providing a fast indication to other hardware threads that a temporarily disabled thread can no longer impact shared resources, such as shared special-purpose registers and translation look-aside buffers within the processor core. Signals from pipelines within the core indicates whether any of the instructions pending in the pipeline impact the shared resources and if not, then the thread disable status is presented to the other threads via a state change in a thread status register. Upon receiving an indication that a particular hardware thread is to be disabled, control logic halts the dispatch of instructions for the particular hardware thread, and then waits until any indication that a shared resource is impacted by an instruction has cleared. Then the control logic updates the thread status to indicate the thread is disabled.07-19-2012
20120191954PROCESSOR HAVING INCREASED PERFORMANCE AND ENERGY SAVING VIA INSTRUCTION PRE-COMPLETION - Methods and apparatuses are provided for achieving increased performance and energy saving via instruction pre-completion without having to schedule instruction execution in processor execution units. The apparatus comprises an operational unit for determining whether an instruction can be completed without scheduling use of an execution unit of the processor and units within the operational unit capable of employing alternate or equivalent processes or techniques to complete the instruction. In this way, the instruction is completed without scheduling use of the execution unit of the processor. The method comprises determining that an instruction can be completed without scheduling use of an execution unit of a processor and then pre-completing the instruction without use of one or more the execution units.07-26-2012
20120204011ASYNCHRONOUS ASSIST THREAD INITIATION - A method of data processing includes a processor of a data processing system executing a controlling thread of a program and detecting occurrence of a particular asynchronous event during execution of the controlling thread of the program. In response to occurrence of the particular asynchronous event during execution of the controlling thread of the program, the processor initiates execution of an assist thread of the program such that the processor simultaneously executes the assist thread and controlling thread of the program.08-09-2012
20120226893Hardware controller to choose selected hardware entity and to execute instructions in relation to selected hardware entity - A hardware controller includes a first hardware interface, a second hardware interface, first hardware logic, and second hardware logic. The first hardware interface is to couple the hardware controller to hardware entities of a hardware device in which the hardware controller is to be included. The second hardware interface is to couple the hardware controller to a memory to receive instructions. The first hardware logic is to choose a selected hardware entity from the hardware entities. The second hardware logic is to execute the instructions in relation to the selected hardware entity.09-06-2012
20120233445Multi-Thread Processors and Methods for Instruction Execution and Synchronization Therein and Computer Program Products Thereof - Methods for instruction execution and synchronization in a multi-thread processor are provided, wherein in the multi-thread processor, multiple threads are running and each of the threads can simultaneously execute a same instruction sequence. A source code or an object code is received and then compiled to generate the instruction sequence. Instructions for all of function calls within the instruction sequence are sorted according to a calling order. Each thread is provided a counter value pointing to one of the instructions in the instruction sequence. A main counter value is determined according to the counter values of the threads such that all of the threads simultaneously execute an instruction of the instruction sequence that the main counter value points to.09-13-2012
20120239909SYSTEMS AND METHODS FOR VOTING AMONG PARALLEL THREADS - One embodiment of the present invention sets forth a technique for efficiently performing voting operations within a multi-threaded parallel-processing system. A group of related parallel program threads executes within a processor core together in parallel. A new instruction, called a “vote” instruction, is introduced that enables a parallel program thread to post an individual vote within the context of the group of related threads and to receive the result of the vote. In this fashion, the vote instruction advantageously reduces overhead associated with inter-thread communication, thereby improving overall system performance.09-20-2012
20120246451PROCESSING LONG-LATENCY INSTRUCTIONS IN A PIPELINED PROCESSOR - There is provided a method and processor for processing a thread. The thread comprises a plurality of sequential instructions, the plurality of sequential instructions comprising some short-latency instructions and some long-latency instructions and at least one hazard instruction, the hazard instruction requiring one or more preceding instructions to be processed before the hazard instruction is processed. The method comprises the steps of: a) before processing each long-latency instruction, incrementing by one, a counter associated with the thread; b) after each long-latency instruction has been processed, decrementing by one, the counter associated with the thread; c) before processing each hazard instruction, checking the value of the counter associated with the thread, and i) if the counter value is zero, processing the hazard instruction, or ii) if the counter value is non-zero, pausing processing of the hazard instruction until a later time. The processor includes means for performing steps a), b) and c) of the method.09-27-2012
20120265969ALLOCATION OF COUNTERS FROM A POOL OF COUNTERS TO TRACK MAPPINGS OF LOGICAL REGISTERS TO PHYSICAL REGISTERS FOR MAPPER BASED INSTRUCTION EXECUTIONS - A computer system assigns a particular counter from among a plurality of counters currently in a counter free pool to count a number of mappings of logical registers from among a plurality of logical registers to a particular physical register from among a plurality of physical registers, responsive to an execution of an instruction by a mapper unit mapping at least one logical register from among the plurality of logical registers to the particular physical register, wherein the number of the plurality of counters is less than a number of the plurality of physical registers. The computer system, responsive to the counted number of mappings of logical registers to the particular physical register decremented to less than a minimum value, returns the particular counter to the counter free pool.10-18-2012
20120265970SYSTEM AND METHOD OF INDIRECT REGISTER ACCESS - Systems and methods are provided for managing access to registers. A system may include a set of direct registers and a set of indirect registers. The indirect registers may be accessed through the direct registers, and the direct registers may provide various features to provide faster access to the indirect registers. One of the direct registers may indicate access modes for accessing the indirect registers. The access modes may include auto-increment, auto-decrement, auto-reset, and no change modes. Based on the access mode, the currently accessed address may be automatically modified after accessing the indirect register at the address.10-18-2012
20120297170DECENTRALIZED ALLOCATION OF RESOURCES AND INTERCONNNECT STRUCTURES TO SUPPORT THE EXECUTION OF INSTRUCTION SEQUENCES BY A PLURALITY OF ENGINES - A method for decentralized resource allocation in an integrated circuit. The method includes receiving a plurality of requests from a plurality of resource consumers of a plurality of partitionable engines to access a plurality resources, wherein the resources are spread across the plurality of engines and are accessed via a global interconnect structure. At each resource, a number of requests for access to said each resource are added. At said each resource, the number of requests are compared against a threshold limiter. At said each resource, a subsequent request that is received that exceeds the threshold limiter is canceled. Subsequently, requests that are not canceled within a current clock cycle are implemented.11-22-2012
20120297171METHODS FOR GENERATING CODE FOR AN ARCHITECTURE ENCODING AN EXTENDED REGISTER SPECIFICATION - There are provided methods and computer program products for generating code for an architecture encoding an extended register specification. A method for generating code for a fixed-width instruction set includes identifying a non-contiguous register specifier. The method further includes generating a fixed-width instruction word that includes the non-contiguous register specifier.11-22-2012
20120331275SYSTEM AND METHOD FOR POWER OPTIMIZATION - A technique for reducing the power consumption required to execute processing operations. A processing complex, such as a CPU or a GPU, includes a first set of cores comprising one or more fast cores and second set of cores comprising one or more slow cores. A processing mode of the processing complex can switch between a first mode of operation and a second mode of operation based on one or more of the workload characteristics, performance characteristics of the first and second sets of cores, power characteristics of the first and second sets of cores, and operating conditions of the processing complex. A controller causes the processing operations to be executed by either the first set of cores or the second set of cores to achieve the lowest total power consumption.12-27-2012
20130007419COMPUTER IMPLEMENTED METHOD OF ELECTING K EXTREME ENTRIES FROM A LIST USING SEPARATE SECTION COMPARISONS - A computer implemented method selects K extreme elements of a list of N elements by partitioning each of the N elements into a plurality of sections. For each section from a most significant section to a least significant section the method selects a threshold selection determining at least K extreme entries from the list. This iteratively compares a corresponding section to a section threshold, counts a number of sections which are more extreme than the section threshold, increasing (or decreasing) the section threshold if the count is greater than K and decreasing the section threshold if the count is less than K. After finding the section threshold for the corresponding section the method forms a combined threshold by concatenation of said section thresholds in order from a most significant section to a least significant section, compares each of the N elements to the combined threshold, and selects at least K elements from the set of N elements more extreme than the combined threshold.01-03-2013
20130007420CHECKING THE INTEGRITY OF A PROGRAM EXECUTED BY AN ELECTRONIC CIRCUIT - A method for checking the integrity of a program executed by an electronic circuit and including at least one conditional jump, wherein: a first value is updated for any instruction which does not correspond to a jump instruction; a second value is updated with the first value for each conditional jump instruction; and the second value is compared with a third value, calculated according to the performed conditional jumps.01-03-2013
20130013899Using Hardware Transaction Primitives for Implementing Non-Transactional Escape Actions Inside Transactions - Mechanisms are provided for performing escape actions within transactions. These mechanisms execute a transaction comprising a transactional section and an escape action. The transactional section is comprised of one or more instructions that are to be executed in an atomic manner as part of the transaction. The escape action is comprised of one or more instructions to be executed in a non-transactional manner. These mechanisms further populate at least one actions list data structure, associated with a thread of the data processing system that is executing the transaction, with one or more actions associated with the escape action. Moreover, these mechanisms execute one or more actions in the actions list data structure based upon whether the transaction commits successfully or is aborted.01-10-2013
20130013900MULTI-THREAD PROCESSOR AND ITS HARDWARE THREAD SCHEDULING METHOD - A multi-thread processor includes a plurality of hardware threads each of which generates an independent instruction flow, a first thread scheduler that outputs a first thread selection signal, the first thread selection signal designating a hardware thread to be executed in a next execution cycle among the plurality of hardware threads according to a priority rank, the priority rank being established in advance for each of the plurality of hardware threads, a first selector that selects one of the plurality of hardware threads according to the first thread selection signal and outputs an instruction generated by the selected hardware thread, and an execution pipeline that executes an instruction output from the first selector. Whenever the hardware thread is executed in the execution pipeline, the first scheduler updates the priority rank for the executed hardware thread and outputs the first thread selection signal in accordance with the updated priority rank.01-10-2013
20130036295GPU ASSISTED GARBAGE COLLECTION - A system and method for efficient garbage collection. A general-purpose central processing unit (CPU) sends a garbage collection request and a first log to a special processing unit (SPU). The first log includes an address and a data size of each allocated data object stored in a heap in memory corresponding to the CPU. The SPU has a single instruction multiple data (SIMD) parallel architecture and may be a graphics processing unit (GPU). The SPU efficiently performs operations of a garbage collection algorithm due to its architecture on a local representation of the data objects stored in the memory.02-07-2013
20130042091BIT Splitting Instruction - An instruction specifies a source value and an offset value. Upon execution of the instruction, a first result of the instruction and a second result of the instruction are generated. The first result is a first portion of the source value and the second result is a second portion of the source value.02-14-2013
20130061026CONFIGURABLE MASS DATA PORTIONING FOR PARALLEL PROCESSING - A configurable mass data portioning for parallel processing is described herein. One or more operation attributes are selected to participate in parallelization criteria. The values of the selected operation attributes for a number of operations are submitted to a specified algorithm using to provide parallelization values corresponding to the operations. The parallelization values are applied to group the operations in comparable portions for parallel execution without conflicts.03-07-2013
20130067202CONDITIONAL NON-BRANCH INSTRUCTION PREDICTION - A microprocessor processes conditional non-branch instructions that specify a condition and instruct the microprocessor to perform an operation if the condition is satisfied and otherwise to not perform the operation. A predictor provides a prediction about a conditional non-branch instruction. An instruction translator translates the conditional non-branch instruction into a no-operation microinstruction when the prediction predicts the condition will not be satisfied, and into a set of one or more microinstructions to unconditionally perform the operation when the prediction predicts the condition will be satisfied. An execution pipeline executes the no-operation microinstruction or the set of microinstructions. The predictor translates into a second set of one or more microinstructions to conditionally perform the operation when the prediction does not make a prediction. In the case of a misprediction, the translator re-translates the conditional non-branch instruction into the second set of microinstructions.03-14-2013
20130080744ABSTRACTING COMPUTATIONAL INSTRUCTIONS TO IMPROVE PERFORMANCE - Methods and systems for executing a code stream of non-native binary code on a computing system are disclosed. One method includes parsing the code stream to detect a plurality of elements including one or more branch destinations, and traversing the code stream to detect a plurality of non-native operators. The method also includes executing a pattern matching algorithm against the plurality of non-native operators to find combinations of two or more non-native operators that do not span across a detected branch destination and that correspond to one or more target operators executable by the computing system. The method further includes generating a second code stream executable on the computing system including the one or more target operators.03-28-2013
20130080745FINE-GRAINED INSTRUCTION ENABLEMENT AT SUB-FUNCTION GRANULARITY - Fine-grained enablement at sub-function granularity. An instruction encapsulates different sub-functions of a function, in which the sub-functions use different sets of registers of a composite register file, and therefore, different sets of functional units. At least one operand of the instruction specifies which set of registers, and therefore, which set of functional units, is to be used in performing the sub-function. The instruction can perform various functions (e.g., move, load, etc.) and a sub-function of the function specifies the type of function (e.g., move-floating point; move-vector; etc.).03-28-2013
20130080746Providing A Dedicated Communication Path Separate From A Second Path To Enable Communication Between Complaint Sequencers Of A Processor Using An Assertion Signal - In one embodiment, the present invention includes a method for communicating an assertion signal from a first instruction sequencer to a plurality of accelerators coupled to the first instruction sequencer, detecting the assertion signal in the accelerators and communicating a request for a lock, and registering an accelerator that achieves the lock by communication of a registration message for the accelerator to the first instruction sequencer. Other embodiments are described and claimed.03-28-2013
20130086363Computer Instructions for Activating and Deactivating Operands - An instruction set architecture (ISA) includes instructions for selectively indicating last-use architected operands having values that will not be accessed again, wherein architected operands are made active or inactive after an instruction specified last-use by an instruction, wherein the architected operands are made active by performing a write operation to an inactive operand, wherein the activation/deactivation may be performed by the instruction having the last-use of the operand or another (prefix) instruction.04-04-2013
20130086364Managing a Register Cache Based on an Architected Computer Instruction Set Having Operand Last-User Information - A multi-level register hierarchy is disclosed comprising a first level pool of registers for caching registers of a second level pool of registers in a system wherein programs can dynamically release and re-enable architected registers such that released architected registers need not be maintained by the processor, the processor accessing operands from the first level pool of registers, wherein a last-use instruction is identified as having a last use of an architected register before being released, the last-use architected register being released causes the multi-level register hierarchy to discard any correspondence of an entry to said last use architected register.04-04-2013
20130086365Exploiting an Architected List-Use Operand Indication in a Computer System Operand Resource Pool - A pool of available physical registers are provided for architected registers, wherein operations are performed that activate and deactivate selected architected registers, such that the deactivated selected architected registers need not retain values, and physical registers can be deallocated to the pool, wherein deallocation of physical registers is performed after a last-use by a designated last-use instruction, wherein the last-use information is provided either by the last-use instruction or a prefix instruction, wherein reads to deallocated architecture registers return an architected default value.04-04-2013
20130097410MACHINE PROCESSOR - Disclosed are machine processors and methods performed thereby. The processor has access to processing units for performing data processing and to libraries. Functions in the libraries are implementable to perform parallel processing and graphics processing. The processor may be configured to acquire (e.g., to download from a web server) a download script, possibly with extensions specifying bindings to library functions. Running the script may cause the processor to create, for each processing unit, contexts in which functions may be run, and to run, on the processing units and within a respective context, a portion of the download script. Running the script may also cause the processor to create, for a processing unit, a memory object, transfer data into that memory object, and transfer data back to the processor in such a way that a memory address of the data in the memory object is not returned to the processor.04-18-2013
20130103931MACHINE PROCESSOR - Disclosed are machine processors and methods performed thereby. The processor has access to processing units for performing data processing and to libraries. Functions in the libraries are implementable to perform parallel processing and graphics processing. The processor may be configured to acquire (e.g., to download from a web server) a download script, possibly with extensions specifying bindings to library functions. Running the script may cause the processor to create, for each processing unit, contexts in which functions may be run, and to run, on the processing units and within a respective context, a portion of the download script. Running the script may also cause the processor to create, for a processing unit, a memory object, transfer data into that memory object, and transfer data back to the processor in such a way that a memory address of the data in the memory object is not returned to the processor.04-25-2013
20130117542CODE COVERAGE FRAMEWORK - A method, information processing system, and computer program product record an execution of a program instruction. A determination is made that a thread has entered a program unit. Another determination is made that that the thread is associated with at least one attribute that matches a set of thread recording criteria. An instruction recording mechanism for the thread is dynamically activated in response to the at least one attribute of the thread matching the set of thread recording criteria.05-09-2013
20130117543LOW OVERHEAD OPERATION LATENCY AWARE SCHEDULER - A method and apparatus for processing multi-cycle instructions include picking a multi-cycle instruction and directing the picked multi-cycle instruction to a pipeline. The pipeline includes a pipeline control configured to detect a latency and a repeat rate of the picked multi-cycle instruction and to count clock cycles based on the detected latency and the detected repeat rate. The method and apparatus further include detecting the repeat rate and the latency of the picked multi-cycle instruction, and counting clock cycles based on the detected repeat rate and the latency of the picked multi-cycle instruction.05-09-2013
20130117544METHOD AND APPARATUS FOR RUN-TIME STATISTICS DEPENDENT PROGRAM EXECUTION USING SOURCE-CODING PRINCIPLES - Disclosed are a method and system for optimized, dynamic data-dependent program execution. The disclosed system comprises a statistics computer which computes statistics of the incoming data at the current time instant, where the said statistics include the probability distribution of the incoming data, the probability distribution over program modules induced by the incoming data, the probability distribution induced over program outputs by the incoming data, and the time-complexity of each program module for the incoming data, wherein the said statistics are computed on as a function of current and past data, and previously computed statistics; a plurality of alternative execution path orders designed prior to run-time by the use of an appropriate source code; a source code selector which selects one of the execution path orders as a function of the statistics computed by the statistics computer; a complexity measurement which measures the time-complexity of the currently selected execution path-order.05-09-2013
20130117545High-Word Facility for Extending the Number of General Purpose Registers Available to Instructions - A computer employs a set of General Purpose Registers (GPRs). Each GPR comprises a plurality of portions. Programs such as an Operating System and Applications operating in a Large GPR mode, access the full GPR, however programs such as Applications operating in Small GPR mode, only have access to a portion at a time. Instruction Opcodes, in Small GPR mode, may determine which portion is accessed.05-09-2013
20130132709METHOD AND SYSTEM FOR PROCESSING INSTRUCTION INFORMATION - A method and system for processing instruction information. Each instruction information character string of a sequence of instruction information character strings are sequentially extracted and processed. Each instruction information character string pertains to an associated target object wrapped in a target object storage unit by an associated operation target model. It is independently ascertained for each instruction information character string whether to generate a code line for each instruction information character string, by: determining whether a requirement is satisfied and generating the code line and storing the code line in a code buffer if the requirement has been determined to be satisfied and not generating the code line if the requirement has been determined to not be satisfied. The requirement relates to whether the instruction information character string being processed comprises a naming instruction or a generation instruction. The generated code lines stored in the code buffer are displayed.05-23-2013
20130138926INDIRECT FUNCTION CALL INSTRUCTIONS IN A SYNCHRONOUS PARALLEL THREAD PROCESSOR - An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.05-30-2013
20130138927DATA PROCESSING APPARATUS ADDRESS RANGE DEPENDENT PARALLELIZATION OF INSTRUCTIONS - A data processing apparatus has an instruction memory system arranged to output an instruction word addressed by an instruction address. An instruction execution unit, processes a plurality of instructions from the instruction word in parallel. A detection unit, detects in which of a plurality of ranges the instruction address lies. The detection unit is coupled to the instruction execution unit and/or the instruction memory system, to control a way in which the instruction execution unit parallelizes processing of the instructions from the instruction word, dependent on a detected range. In an embodiment the instruction execution unit and/or the instruction memory system adjusts a width of the instruction word that determines a number of instructions from the instruction word that is processed in parallel, dependent on the detected range.05-30-2013
20130145132STATICALLY SPECULATIVE COMPILATION AND EXECUTION - A system, for use with a compiler architecture framework, includes performing a statically speculative compilation process to extract and use speculative static information, encoding the speculative static information in an instruction set architecture of a processor, and executing a compiled computer program using the speculative static information, wherein executing supports static speculation driven mechanisms and controls.06-06-2013
20130159678Code optimization by memory barrier removal and enclosure within transaction - A code section of a computer program to be executed by a computing device includes memory barrier instructions. Where the code section satisfies a threshold, the code section is modified, by enclosing the code section within a transaction that employs hardware transactional memory of the computing device, and removing the memory barrier instructions from the code section. Execution of the code section as has been enclosed within the transaction can be monitored to yield monitoring results. Where the monitoring results satisfy an abort threshold corresponding to excessive aborting of the execution of the code section as has been enclosed within the transaction, the code section is split into code sub-sections, and each code sub-section enclosed within a separate transaction that employs the hardware transactional memory. Splitting the code section sections and enclosing each code sub-section within a separate transaction can decrease occurrence of the code section aborting during execution.06-20-2013
20130159679Providing Hint Register Storage For A Processor - In one embodiment, the present invention includes a method for receiving a data access instruction and obtaining an index into a data access hint register (DAHR) register file of a processor from the data access instruction, reading hint information from a register of the DAHR register file accessed using the index, and performing the data access instruction using the hint information. Other embodiments are described and claimed.06-20-2013
20130166887DATA PROCESSING APPARATUS AND DATA PROCESSING METHOD - According to one embodiment, a data processing apparatus includes a processor and a memory. The processor includes core blocks. The memory stores a command queue and task management structure data. The command queue stores a series of kernel functions. The task management structure data defines an order of execution of kernel functions by associating a return value of a previous kernel function with an argument of a subsequent kernel function. Core blocks of the processor are capable of executing different kernel functions.06-27-2013
20130166888PREDICTIVE OPERATOR GRAPH ELEMENT PROCESSING - Techniques are described for predictively starting a processing element. Embodiments receive streaming data to be processed by a plurality of processing elements. An operator graph of the plurality of processing elements that defines at least one execution path is established. Embodiments determine a historical startup time for a first processing element in the operator graph, where, once started, the first processing element begins normal operations once the first processing element has received a requisite amount of data from one or more upstream processing elements. Additionally, embodiments determine an amount of time the first processing element takes to receive the requisite amount of data from the one or more upstream processing elements. The first processing element is then predictively started at a first startup time based on the determined historical startup time and the determined amount of time historically taken to receive the requisite amount of data.06-27-2013
20130173888Processor for Executing Wide Operand Operations Using a Control Register and a Results Register - A programmable processor and method for improving the performance of processors by expanding at least two source operands, or a source and a result operand, to a width greater than the width of either the general purpose register or the data path width. The present invention provides operands which are substantially larger than the data path width of the processor by using the contents of a general purpose register to specify a memory address at which a plurality of data path widths of data can be read or written, as well as the size and shape of the operand. In addition, several instructions and apparatus for implementing these instructions are described which obtain performance advantages if the operands are not limited to the width and accessible number of general purpose registers.07-04-2013
20130173889PARALLEL PROCESSING SYSTEM FOR COMPUTING PARTICLE INTERACTIONS - A parallel processing system for computing particle interactions includes a plurality of computation nodes arranged according to a geometric partitioning of a simulation volume. Each computation node has storage for particle data. This particle data is associated with particles in a region of the geometrically partitioned simulation volume. The parallel processing system also includes a communication system having links interconnecting the computation nodes. Each of the computation nodes includes a processor subsystem. These processor subsystems cooperate to coordinate computation of the particle interactions in a distributed manner.07-04-2013
20130185543GENERAL PURPOSE EMBEDDED PROCESSOR - The invention provides an embedded processor architecture comprising a plurality of virtual processing units that each execute processes or threads (collectively, “threads”). One or more execution units, which are shared by the processing units, execute instructions from the threads. An event delivery mechanism delivers events—such as, by way of non-limiting example, hardware interrupts, software-initiated signaling events (“software events”) and memory events—to respective threads without execution of instructions. Each event can, per aspects of the invention, be processed by the respective thread without execution of instructions outside that thread. The threads need not be constrained to execute on the same respective processing units during the lives of those threads—though, in some embodiments, they can be so constrained. The execution units execute instructions from the threads without needing to know what threads those instructions are from.07-18-2013
20130191618Systems and Methods for Dynamic Scaling in a Data Decoding System - Various embodiments of the present invention provide systems and methods for data processing using variable scaling.07-25-2013
20130198493SYSTEMS AND METHODS THAT FACILITATE MANAGEMENT OF ADD-ON INSTRUCTION GENERATION, SELECTION, AND/OR MONITORING DURING EXECUTION - The subject invention relates to systems and methods that facilitate display, selection, and management of context associated with execution of add-on instructions. The systems and methods track add-on instruction calls provide a user with call and data context, wherein the user can select a particular add-on instruction context from a plurality of contexts in order to observe values and/or edit parameters associated with the add-on instruction. The add-on instruction context can include information such as instances of data for particular lines of execution, the add-on instruction called, a caller of the instruction, a location of the instruction call, references to complex data types and objects, etc. The systems and methods further provide a technique for automatic routine selection based on the add-on instruction state information such that the add-on instruction executed corresponds to a current state.08-01-2013
20130205120PROCESSOR PERFORMANCE IMPROVEMENT FOR INSTRUCTION SEQUENCES THAT INCLUDE BARRIER INSTRUCTIONS - A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining that the load instruction is resolved based upon receipt of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.08-08-2013
20130205121PROCESSOR PERFORMANCE IMPROVEMENT FOR INSTRUCTION SEQUENCES THAT INCLUDE BARRIER INSTRUCTIONS - A technique for processing an instruction sequence that includes a barrier instruction, a load instruction preceding the barrier instruction, and a subsequent memory access instruction following the barrier instruction includes determining, by a processor core, that the load instruction is resolved based upon receipt by the processor core of an earliest of a good combined response for a read operation corresponding to the load instruction and data for the load instruction. The technique also includes if execution of the subsequent memory access instruction is not initiated prior to completion of the barrier instruction, initiating by the processor core, in response to determining the barrier instruction completed, execution of the subsequent memory access instruction. The technique further includes if execution of the subsequent memory access instruction is initiated prior to completion of the barrier instruction, discontinuing by the processor core, in response to determining the barrier instruction completed, tracking of the subsequent memory access instruction with respect to invalidation.08-08-2013
20130205122INSTRUCTION SET ARCHITECTURE-BASED INTER-SEQUENCER COMMUNICATIONS WITH A HETEROGENEOUS RESOURCE - In one embodiment, the present invention includes a method for directly communicating between an accelerator and an instruction sequencer coupled thereto, where the accelerator is a heterogeneous resource with respect to the instruction sequencer. An interface may be used to provide the communication between these resources. Via such a communication mechanism a user-level application may directly communicate with the accelerator without operating system support. Further, the instruction sequencer and the accelerator may perform operations in parallel. Other embodiments are described and claimed.08-08-2013
20130212361INSTRUCTION AND LOGIC FOR PROCESSING TEXT STRINGS - Method, apparatus, and program means for performing a string comparison operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store a result of a comparison between each data element of a first and second operand corresponding to a first and second text string, respectively.08-15-2013
20130246751VECTOR FIND ELEMENT NOT EQUAL INSTRUCTION - Processing of character data is facilitated. A Find Element Not Equal instruction is provided that compares data of multiple vectors for inequality and provides an indication of inequality, if inequality exists. An index associated with the unequal element is stored in a target vector register. Further, the same instruction, the Find Element Not Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.09-19-2013
20130246752VECTOR FIND ELEMENT EQUAL INSTRUCTION - Processing of character data is facilitated. A Find Element Equal instruction is provided that compares data of multiple vectors for equality and provides an indication of equality, if equality exists. An index associated with the equal element is stored in a target vector register. Further, the same instruction, the Find Element Equal instruction, also searches a selected vector for null elements, also referred to as zero elements. A result of the instruction is dependent on whether the null search is provided, or just the compare.09-19-2013
20130246753VECTOR STRING RANGE COMPARE - Processing of character data is facilitated. A Vector String Range Compare instruction is provided that compares each element of a vector with a range of values based on a set of controls to determine if there is a match. An index associated with the matched element or a mask representing the matched element is stored in a target vector register. Further, the same instruction, the Vector String Range Compare instruction, also searches a selected vector for null elements, also referred to as zero elements.09-19-2013
20130246754RUN-TIME INSTRUMENTATION INDIRECT SAMPLING BY ADDRESS - Embodiments of the invention relate to implementing run-time instrumentation indirect sampling by address. An aspect of the invention includes reading sample-point addresses from a sample-point address array, and comparing, by a processor, the sample-point addresses to an address associated with an instruction from an instruction stream executing on the processor. A sample point is recognized upon execution of the instruction associated with the address matching one of the sample-point addresses. Run-time instrumentation information is obtained from the sample point. The run-time instrumentation information is stored in a run-time instrumentation program buffer as a reporting group.09-19-2013
20130246755RUN-TIME INSTRUMENTATION REPORTING - Embodiments of the invention relate to run-time instrumentation reporting. An instruction stream is executed by a processor. Run-time instrumentation information of the executing instruction stream is captured by the processor. Run-time instrumentation records are created based on the captured run-time instrumentation information. A run-time instrumentation sample point of the executing instruction stream on the processor is detected. A reporting group is stored in a run-time instrumentation program buffer. The storing is based on the detecting and the storing includes: determining a current address of the run-time instrumentation program buffer, the determining based on instruction accessible run-time instrumentation controls; and storing the reporting group into the run-time instrumentation program buffer based on an origin address and the current address of the run-time instrumentation program buffer, the reporting group including the created run-time instrumentation records.09-19-2013
20130246756HARDWARE PROTOCOL STACK - Disclosed is a hardware protocol stack, where header information of analysis-subjected protocol is stored in a register unit, comparison is made whether information recorded in the header of inputted frame mutually matches header information stored in the register unit, and data is extracted as a result of the comparison.09-19-2013
20130262834Hardware Managed Ordered Circuit - A system and method is provided for improving efficiency, power, and bandwidth consumption in parallel processing. Rather than requiring memory polling to ensure ordered execution of processes or threads, the techniques disclosed herein provide a system and method to allow any process or thread to run out of order as long as needed, but ensure ordered execution of multiple ordered instructions when needed. These operations are handled efficiently in hardware, but are flexible enough to be implemented in all manner of programming models.10-03-2013
20130262835CODE GENERATION METHOD AND INFORMATION PROCESSING APPARATUS - An information processing apparatus generates first and second operation trees representing a dependency relationship among the instructions included in a first code, and computes first and second operation sequences from the first and second operation trees. Then, the information processing apparatus computes the longest ones of operation subsequences common to the first and second operation sequences, evaluates, for each longest operation subsequence, the utilization of computing resources used for executing the combinations of instructions of the first and second operation trees corresponding to the operations included in the longest operation subsequence, and selects a combination pattern of instructions indicated by any one of the longest operation subsequences on the basis of the evaluation results.10-03-2013
20130275725INTEGRATED CIRCUIT DEVICE AND METHOD FOR PERFORMING CONDITIONAL NEGATION OF DATA - An integrated circuit device comprising at least one digital signal processor (DSP) module, the at least one DSP module comprising a first data register and at least one further data register and at least one data execution unit (DEU) module arranged to execute operations on target data stored within the first data register and the at least one further data register. The at least one DEU module is arranged, upon receipt of a conditional negation instruction, to retrieve at least one conditional bit value from the first data register, and conditionally perform negation of target data within the at least one further data register according to the at least one retrieved conditional bit value.10-17-2013
20130283015EXPRESSING PARALLEL EXECUTION RELATIONSHIPS IN A SEQUENTIAL PROGRAMMING LANGUAGE - Circuits, methods, and apparatus that provide parallel execution relationships to be included in a function call or other appropriate portion of a command or instruction in a sequential programming language. One example provides a token-based method of expressing parallel execution relationships. Each process that can be executed in parallel is given a separate token. Later processes that depend on earlier processes wait to receive the appropriate token before being executed. In another example, counters are used in place to tokens to determine when a process is completed. Each function is a number of individual functions or threads, where each thread performs the same operation on a different piece of data. A counter is used to track the number of threads that have been executed. When each thread in the function has been executed, a later function that relies on data generated by the earlier function may be executed.10-24-2013
20130290683Eliminating Redundant Masking Operations Instruction Processing Circuits, And Related Processor Systems, Methods, And Computer-Readable Media - Eliminating redundant masking operations in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first instruction in an instruction stream indicating an operation writing a value to a first register is detected by an instruction processing circuit, the value having a value size less than a size of the first register. The circuit also detects a second instruction in the instruction stream indicating a masking operation on the first register. The masking operation is eliminated upon a determination that the masking operation indicates a read operation and a write operation on the first register and has an identity mask size equal to or greater than the value size. in this manner, the elimination of the masking operation avoids potential read-after-write hazards and improves performance of a CPU by removing redundant operations from an execution pipeline.10-31-2013
20130297914MAINTAINING DATA COHERENCE BY USING DATA DOMAINS - A method, system and computer program product are disclosed for maintaining data coherence, for use in a multi-node processing system where each of the nodes includes one or more components. In one embodiment, the method comprises establishing a data domain, assigning a group of the components to the data domain, sending a coherence message from a first component of the processing system to a second component of the processing system, and determining if that second component is assigned to the data domain. In this embodiment, if that second component is assigned to the data domain, the coherence message is transferred to all of the components assigned to the data domain to maintain data coherency among those components. In an embodiment, if that second component is assigned to the data domain, the first component is assigned to the data domain.11-07-2013
20130297915FLAG NON-MODIFICATION EXTENSION FOR ISA INSTRUCTIONS USING PREFIXES - In one embodiment, a processor includes an instruction decoder to receive and decode an instruction having a prefix and an opcode, an execution unit to execute the instruction based on the opcode, and flag modification override logic to prevent the execution unit from modifying a flag register of the processor based on the prefix of the instruction.11-07-2013
20130305023EXECUTION OF A PERFORM FRAME MANAGEMENT FUNCTION INSTRUCTION - Optimizations are provided for frame management operations, including a clear operation and/or a set storage key operation, requested by pageable guests. The operations are performed, absent host intervention, on frames not resident in host memory. The operations may be specified in an instruction issued by the pageable guests.11-14-2013
20130318327Method and apparatus for data processing - A method for processing an operating sequence of instructions of a program in a processor, wherein each instruction is represented by an assigned instruction code which comprises one execution step to be processed by the processor or a plurality of execution steps to be processed successively by the processor, includes determining an actual signature value assigned to a current execution step of the execution steps of the instruction code representing the instruction of the operating sequence; determining, in a manner dependent on an address value, a desired signature value assigned to the current execution step; and if the actual signature value does not correspond to the desired signature value, omitting at least one execution step directly available for execution and/or an execution step indirectly available for execution.11-28-2013
20130318328APPARATUS AND METHOD FOR SHUFFLING FLOATING POINT OR INTEGER VALUES - An apparatus and method are described for shuffling data elements from source registers to a destination register. For example, a method according to one embodiment includes the following operations: reading each mask bit stored in a mask data structure, the mask data structure containing mask bits associated with data elements of a destination register, the values usable for determining whether a masking operation or a shuffle operation should be performed on data elements stored within a first source register and a second source register; for each data element of the destination register, if a mask bit associated with the data element indicates that a shuffle operation should be performed, then shuffling data elements from the first source register and the second source register to the specified data element within the destination register; and if the mask bit indicates that a masking operation should be performed, then performing a specified masking operation with respect to the data element of the destination register.11-28-2013
20130332704Method for Improving Performance of a Pipelined Microprocessor by Utilizing Pipeline Virtual Registers - A method for improving performance of a pipelined microprocessor by utilizing pipeline virtual registers allows for either decreased register spillage or decreased area and power consumption of a microprocessor. The microprocessor takes advantage of register bypass logic to write short-lived values to virtual registers, which are discarded instead of being written to the register bank, thus reducing register pressure by avoiding short-lived values being written to the register bank.12-12-2013
20130332705PROFILING ASYNCHRONOUS EVENTS RESULTING FROM THE EXECUTION OF SOFTWARE AT CODE REGION GRANULARITY - A combination of hardware and software collect profile data for asynchronous events, at code region granularity. An exemplary embodiment is directed to collecting metrics for prefetching events, which are asynchronous in nature. Instructions that belong to a code region are identified using one of several alternative techniques, causing a profile bit to be set for the instruction, as a marker. Each line of a data block that is prefetched is similarly marked. Events corresponding to the profile data being collected and resulting from instructions within the code region are then identified. Each time that one of the different types of events is identified, a corresponding counter is incremented. Following execution of the instructions within the code region, the profile data accumulated in the counters are collected, and the counters are reset for use with a new code region.12-12-2013
20130332706ELECTRONIC DEVICE, ELECTRONIC DEVICE COOPERATING SYSTEM, AND ELECTRONIC DEVICE CONTROLLING METHOD - There are included a communication part performing communication with other device; a command managing part transmitting a command of an own device to other device and receiving a command of other device to acquire the command of other device by the communication part, and managing the command of the own device and the command of other device; and a command processing part executing processing of a function corresponding to the command of the own device by the own device when a command selected from the commands managed by the command managing part is the command of the own device, and executing processing of a function corresponding to the command of other device by the other device when the command selected is the command of the other device.12-12-2013
20130339672Next Instruction Access Intent Instruction - Executing a Next Instruction Access Intent instruction by a computer. The processor obtains an access intent instruction indicating an access intent. The access intent is associated with an operand of a next sequential instruction. The access intent indicates usage of the operand by one or more instructions subsequent to the next sequential instruction. The computer executes the access intent instruction. The computer obtains the next sequential instruction. The computer executes the next sequential instruction, which comprises based on the access intent, adjusting one or more cache behaviors for the operand of the next sequential instruction.12-19-2013
20130339673INTRA-INSTRUCTIONAL TRANSACTION ABORT HANDLING - Embodiments relate to intra-instructional transaction abort handling. An aspect includes using an emulation routine to execute an instruction within a transaction. The instruction includes at least one unit of operation. The transaction effectively delays committing stores to memory until the transaction has completed successfully. After receiving an abort indication, emulation of the instruction is terminated prior to completing the execution of the instruction. The instruction is terminated after the emulation routine completes any previously initiated unit of operation of the instruction.12-19-2013
20130339674RESTRICTED INSTRUCTIONS IN TRANSACTIONAL EXECUTION - Restricted instructions are prohibited from execution within a transaction. There are classes of instructions that are restricted regardless of type of transaction: constrained or nonconstrained. There are instructions only restricted in constrained transactions, and there are instructions that are selectively restricted for given transactions based on controls specified on instructions used to initiate the transactions.12-19-2013
20130339675RANDOMIZED TESTING WITHIN TRANSACTIONAL EXECUTION - Task specific diagnostic controls are provided to facilitate the debugging of certain types of abort conditions. The diagnostic controls may be set to cause transactions to be selectively aborted, allowing a transaction to drive its abort handler routine for testing purposes. The controls include, for instance, a transaction diagnostic scope and a transaction diagnostic control. The transaction diagnostic scope indicates when the transaction diagnostic control is to be applied, and the transaction diagnostic control indicates whether transactions are to selectively aborted.12-19-2013
20130339676TRANSACTION ABORT INSTRUCTION - A TRANSACTION ABORT instruction is used to abort a transaction that is executing in a computing environment. The TRANSACTION ABORT instruction includes at least one field used to specify a user-defined abort code that indicates the specific reason for aborting the transaction. Based on executing the TRANSACTION ABORT instruction, a condition code is provided that indicates whether re-execution of the transaction is recommended.12-19-2013
20140052966MECHANISM FOR CONSISTENT CORE HANG DETECTION IN A PROCESSOR CORE - Mechanism for consistent core hang detection on a processor with multiple processor cores, each having one or more instruction execution pipelines. Each core may also include a hang detection unit with a counter unit that may generate a count value based on a clock source having a frequency that is independent of a frequency of a processor core clock. The hang detection unit may also include a detector logic unit that may determine whether a given instruction execution pipeline has ceased processing a given instruction based upon a state of the processor core and whether or not the given instruction has completed execution prior to the count value exceeding a predetermined value.02-20-2014
20140052967METHOD AND APPARATUS FOR DYNAMIC DATA CONFIGURATION - A method and apparatus for configuring dynamic data are provided. A compilation apparatus may select a data format showing an optimum performance when a binary code is executed, from among a plurality of data formats supported by an execution apparatus used to execute a binary code, and may generate a binary code that uses the selected data format. The execution apparatus may execute a binary code provided by the compilation apparatus.02-20-2014
20140075160SYSTEM AND METHOD FOR SYNCHRONIZING THREADS IN A DIVERGENT REGION OF CODE - A system and method are provided for synchronizing threads in a divergent region of code within a multi-threaded parallel processing system. The method includes, prior to any thread entering a divergent region, generating a count that represents a number of threads that will enter the divergent region. The method also includes using the count within the divergent region to synchronize the threads in the divergent region.03-13-2014
20140075161Data-Parallel Computation Management - Data-parallel computation programs may be improved by, for example, determining the functional properties user defined functions (UDFs), eliminating unnecessary data-shuffling stages, and/or changing data-partition properties to cause desired data properties to appear after one or more user defined functions are applied.03-13-2014
20140082331SYSTEM AND METHOD FOR SYNCHRONIZING PROCESSOR INSTRUCTION EXECUTION - A system and method for controlling processor instruction execution. In one example, a method for synchronizing a number of instructions performed by processors includes instructing a first processor to iteratively execute instructions via a first set of iterations until a predetermined time period has elapsed. A number of instructions executed in each iteration of the first set of iterations is less than a number of instructions executed in a prior iteration of the first set of iterations. The method also includes instructing a second processor to iteratively execute instructions via a second set of iterations until the predetermined time period has elapsed. A number of instructions executed in each iteration of the second set of iterations is less than a number of instructions executed in a prior iteration of the second set of iterations. The method includes determining whether additional instructions are to be executed.03-20-2014
20140089640PROCESSOR WITH VARIABLE INSTRUCTION ATOMICITY - A processor includes a plurality of execution units. At least one of the execution units is configured to execute a complex instruction that requires multiple instruction cycles to execute, and to enforce atomic execution of the complex instruction during a first-portion of the multiple instruction cycles required to execute the complex instruction. The at least one of the execution units is further configured to enable execution of the complex instruction to be interrupted for execution of a different instruction by the at least one execution unit during execution of a second portion of the multiple instruction cycles. The first portion and the second portion are non-overlapping.03-27-2014
20140089641PROCESSOR WITH INSTRUCTION ITERATION - A processor includes a plurality of execution units. At least one of the execution units is configured to repeatedly execute a first instruction based on a first field of the first instruction indicating that the first instruction is to be iteratively executed.03-27-2014
20140089642METHODS AND SYSTEMS FOR PERFORMING A REPLAY EXECUTION - One or more embodiments may provide a method for performing a replay. The method includes initiating execution of a program, the program having a plurality of sets of instructions, and each set of instructions has a number of chunks of instructions. The method also includes intercepting, by a virtual machine unit executing on a processor, an instruction of a chunk of the number of chunks before execution. The method further includes determining, by a replay module executing on the processor, whether the chunk is an active chunk, and responsive to the chunk being the active chunk, executing the instruction.03-27-2014
20140095837READ AND WRITE MASKS UPDATE INSTRUCTION FOR VECTORIZATION OF RECURSIVE COMPUTATIONS OVER INTERDEPENDENT DATA - A processor executes a mask update instruction to perform updates to a first mask register and a second mask register. A register file within the processor includes the first mask register and the second mask register. The processor includes execution circuitry to execute the mask update instruction. In response to the mask update instruction, the execution circuitry is to invert a given number of mask bits in the first mask register, and also to invert the given number of mask bits in the second mask register.04-03-2014
20140095838Physical Reference List for Tracking Physical Register Sharing - A processor includes a processing unit including a storage module having stored thereon a physical reference list for storing identifications of physical registers that have been referenced by multiple logical registers, and a reclamation module for reclaiming physical registers to a free list based on a count of each of the physical registers on the physical reference list.04-03-2014
20140095839MONITORING PROCESSING TIME IN A SHARED PIPELINE - A pipelined processing device includes: a pipeline controller configured to receive at least one instruction associated with an operation from each of a plurality of subcontrollers, and input the at least one instruction into a pipeline; and a pipeline counter configured to receive an active time value from each of the plurality of subcontrollers, the active time value indicating at least a portion of a time taken to process the at least one instruction, the pipeline controller configured to route the active time value to a shared pipeline storage for performance analysis.04-03-2014
20140095840DIAGNOSE INSTRUCTION FOR SERIALIZING PROCESSING - A system serialization capability is provided to facilitate processing in those environments that allow multiple processors to update the same resources. The system serialization capability is used to facilitate processing in a multi-processing environment in which guests and hosts use locks to provide serialization. The system serialization capability includes a diagnose instruction which is issued after the host acquires a lock, eliminating the need for the guest to acquire the lock.04-03-2014
20140095841PROCESSOR AND CONTROL METHOD OF PROCESSOR - A processor including a circuit unit includes a state information holding unit, a direction controller, a direction generator, and a direction execution unit. The state information holding unit holds state information indicating a state of the circuit unit. The direction controller decodes a first direction for generating a control direction that is contained in a program. The direction generator generates a second direction when the first direction decoded by the direction controller is a direction for generating the second direction for reading the state information from the state information holding unit. The direction execution unit reads the state information from the state information holding unit based on the second direction generated by the direction generator so as to store the state information in a register unit that is capable of being read from a program.04-03-2014
20140101417CODE COVERAGE FRAMEWORK - An information processing system records an execution of a program instruction. A determination is made that a thread has entered a program unit. Another determination is made that that the thread is associated with at least one attribute that matches a set of thread recording criteria. An instruction recording mechanism for the thread is dynamically activated in response to the at least one attribute of the thread matching the set of thread recording criteria.04-10-2014
20140115301PROCESSOR ARCHITECTURE AND METHOD FOR SIMPLIFYING PROGRAMMING SINGLE INSTRUCTION, MULTIPLE DATA WITHIN A REGISTER - The present disclosure provides a processor, and associated method, for performing parallel processing within a register. An exemplary processor may include a processing element having a compute unit and a register file. The register file includes a register that is divisible into lanes for parallel processing. The processor may further include a mask register and a predicate register. The mask register and the predicate register respective include a number of mask bits and predicate bits equal to a maximum number of divisible lanes of the register. A state of the mask bits and predicate bits is set to respectively achieve enabling/disabling of the lanes from executing an instruction and conditional performance of an operation defined by the instruction. Further, the processor is operable to perform a reduction operation across the lanes of the processing element and/or generate an address for each of the lanes of the processing element.04-24-2014
20140115302PREDICATE COUNTER - According to an example embodiment, a processor such as a digital signal processor (DSP), is provided with a register acting as a predicate counter. The predicate counter may include more than two useful values, and in addition to acting as a condition for executing an instruction, may also keep track of nesting levels within a loop or conditional branch. In some cases, the predicate counter may be configured to operate in single-instruction, multiple data (SIMD) mode, or SIMD-within-a-register (SWAR) mode.04-24-2014
20140122838WORK-QUEUE-BASED GRAPHICS PROCESSING UNIT WORK CREATION - One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.05-01-2014
20140129806LOAD/STORE PICKER - A method and apparatus for picking load or store instructions is presented. Some embodiments of the method include determining that the entry in the queue includes an instruction that is ready to be executed by the processor based on at least one instruction-based event and concurrently determining cancel conditions based on global events of the processor. Some embodiments also include selecting the instruction for execution when the cancel conditions are not satisfied.05-08-2014
20140136819REGISTER FILE MANAGEMENT FOR OPERATIONS USING A SINGLE PHYSICAL REGISTER FOR BOTH SOURCE AND RESULT - A processor includes a physical register file having physical registers and an execution unit to perform an arithmetic operation to generate a result mapped to a physical register, wherein the processor delays a write of the result to the physical register file until the result is qualified as valid. A method includes mapping the same physical register both to store load data of a load-execute operation and to subsequently store a result of an arithmetic operation of the load-execute operation, and writing the load data into the physical register. The method further includes, in a first clock cycle, executing the arithmetic operation to generate the result, and, in a second clock cycle, providing the result as a source operand for a dependent operation. The method includes, in a third clock cycle, enabling a write of the result to the physical register file responsive to the result qualifying as valid.05-15-2014
20140143523SPECULATIVE FINISH OF INSTRUCTION EXECUTION IN A PROCESSOR CORE - In a processor core, high latency operations are tracked in entries of a data structure associated with an execution unit of the processor core. In the execution unit, execution of an instruction dependent on a high latency operation tracked by an entry of the data structure is speculatively finished prior to completion of the high latency operation. Speculatively finishing the instruction includes reporting an identifier of the entry to completion logic of the processor core and removing the instruction from an execution pipeline of the execution unit. The completion logic records dependence of the instruction on the high latency operation and commits execution results of the instruction to an architected state of the processor only after successful completion of the high latency operation.05-22-2014
20140164743REORDERING BUFFER FOR MEMORY ACCESS LOCALITY - Systems and methods for scheduling instructions for execution on a multi-core processor reorder the execution of different threads to ensure that instructions specified as having localized memory access behavior are executed over one or more sequential clock cycles to benefit from memory access locality. At compile time, code sequences including memory access instructions that may be localized are delineated into separate batches. A scheduling unit ensures that multiple parallel threads are processed over one or more sequential scheduling cycles to execute the batched instructions. The scheduling unit waits to schedule execution of instructions that are not included in the particular batch until execution of the batched instructions is done so that memory access locality is maintained for the particular batch. In between the separate batches, instructions that are not included in a batch are scheduled so that threads executing non-batched instructions are also processed and not starved.06-12-2014
20140173257REQUESTING SHARED VARIABLE DIRECTORY (SVD) INFORMATION FROM A PLURALITY OF THREADS IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for requesting shared variable directory (SVD) information from a plurality of threads in a parallel computer are provided. Embodiments include a runtime optimizer detecting that a first thread requires a plurality of updated SVD information associated with shared resource data stored in a plurality of memory partitions. Embodiments also include a runtime optimizer broadcasting, in response to detecting that the first thread requires the updated SVD information, a gather operation message header to the plurality of threads. The gather operation message header indicates an SVD key corresponding to the required updated SVD information and a local address associated with the first thread to receive a plurality of updated SVD information associated with the SVD key. Embodiments also include the runtime optimizer receiving at the local address, the plurality of updated SVD information from the plurality of threads.06-19-2014
20140181479METHOD, APPARATUS, SYSTEM CREATING, EXECUTING AND TERMINATING MINI-THREADS - Described herein are mechanisms for creating, executing, and terminating mini-threads. A processor executes instructions with a primary thread in a first execution mode, and to execute an instruction to create a secondary mini-thread that is associated with a first subset of registers and associates the primary thread with a second subset of the registers during a second execution mode. During the second execution mode, the primary thread operates as a primary mini-thread.06-26-2014
20140181480NESTED SPECULATIVE REGIONS FOR A SYNCHRONIZATION FACILITY - An apparatus, computer readable medium, and method of performing nested speculative regions are presented. The method includes responding to entering a speculative region by storing link information to an abort handler and responding to a commit command by removing link information from the abort handler. The method may include storing link information to the abort handler associated with the speculative region. When the speculative region is nested, the method may include storing link information to an abort handler associated with a previous speculative region. Removing link information may include removing link information from the abort handler associated with the corresponding speculative region. The method may include restoring link information to the abort handler associated with a previous speculative region. Responding to an abort command may include running the abort handler associated with the aborted speculative region. The method may include running the abort handler of each previous speculative region.06-26-2014
20140189316EXECUTION PIPELINE DATA FORWARDING - In one embodiment, in an execution pipeline having a plurality of execution subunits, a method of using a bypass network to directly forward data from a producing execution subunit to a consuming execution subunit is provided. The method includes producing output data with the producing execution subunit, consuming input data with the consuming execution subunit, for one or more intervening operations whose input is the output data from the producing execution subunit and whose output is the input data to the consuming execution subunit, evaluating those one or more intervening operations to determine whether their execution would compose an identify function, and if the one or more intervening operations would compose such an identity function, controlling the bypass network to forward the producing execution subunit's output data directly to the consuming execution subunit.07-03-2014
20140189317APPARATUS AND METHOD FOR A HYBRID LATENCY-THROUGHPUT PROCESSOR - An apparatus and method are described for executing both latency-optimized execution logic and throughput-optimized execution logic on a processing device. For example, a processor according to one embodiment comprises: latency-optimized execution logic to execute a first type of program code; throughput-optimized execution logic to execute a second type of program code, wherein the first type of program code and the second type of program code are designed for the same instruction set architecture; logic to identify the first type of program code and the second type of program code within a process and to distribute the first type of program code for execution on the latency-optimized execution logic and the second type of program code for execution on the throughput-optimized execution logic.07-03-2014
20140189318AUTOMATIC REGISTER PORT SELECTION IN EXTENSIBLE PROCESSOR ARCHITECTURE - This document discusses, among other things, systems and methods to access n consecutive entries of a register file in a single operation using a register file entry index consisting of B bits, wherein B is less than the binary logarithm of a depth of the register file, which corresponds to the number of entries in the register file, and to automatically select, for a set of register arguments for the n consecutive entries, between a register port for each argument requiring a register port or one or more shared register ports for the set of register arguments according to description of an instruction set architecture associated with the register file.07-03-2014
20140208076DFA COMPRESSION AND EXECUTION - A character class (CCL) memory containing simple CCLs represented by encoding contained symbols or minimum and maximum symbols of a range, complex CCLs represented by bit-masks indicating contained symbols, and equivalence class (EC) maps represented as tables of ED values for each symbol value. Determining a next DFA transition by comparing multiple CCLs with a single input symbol, and selecting a transition according to the first matching CCL, or selecting a transition corresponding to a vector of CCL match result bits. Comparing CCLs from one DFA instruction to determine a transition and if no CCLs match, comparing CCLs from a second DDFA instruction to determine the transition. Matching linear sequence of two or more DFA states using a sequence of multiple CCLs encoded in a single DFA instruction.07-24-2014
20140215190COMPLETING LOAD AND STORE INSTRUCTIONS IN A WEAKLY-ORDERED MEMORY MODEL - Techniques are disclosed relating to completion of load and store instructions in a weakly-ordered memory model. In one embodiment, a processor includes a load queue and a store queue and is configured to associate queue information with a load instruction in an instruction stream. In this embodiment, the queue information indicates a location of the load instruction in the load queue and one or more locations in the store queue that are associated with one or more store instructions that are older than the load instruction. The processor may determine, using the queue information, that the load instruction does not conflict with a store instruction in the store queue that is older than the load instruction. The processor may remove the load instruction from the load queue while the store instruction remains in the store queue. The queue information may include a wrap value for the load queue.07-31-2014
20140215191LOAD ORDERING IN A WEAKLY-ORDERED PROCESSOR - Techniques are disclosed relating to ordering of load instructions in a weakly-ordered memory model. In one embodiment, a processor includes a cache with multiple cache lines and a store queue configured to maintain status information associated with a store instruction that targets a location in one of the cache lines. In this embodiment, the processor is configured to set an indicator in the status information in response to migration of the targeted cache line. The indicator may be usable to sequence performance of load instructions that are younger than the store instruction. For example, the processor may be configured to wait, based on the indicator, to perform a younger load instruction that targets the same location as the store instruction until the store instruction is removed from the store queue. This may prevent forwarding of the value of the store instruction to the younger load and preserve load-load ordering.07-31-2014
20140215192HEAP DATA MANAGEMENT FOR LIMITED LOCAL MEMORY(LLM) MULTI-CORE PROCESSORS - A compiler tool-chain may automatically compile an application to execute on a limited local memory (LLM) multi-core processor by including automated heap management transparently to the application. Management of the heap in the LLM for the application may include identifying access attempts to a program variable, transferring the program variable to the LLM, when not already present in the LLM, and returning a local address for the program variable to the application. The application then accesses the program variable using the local address transparently without knowledge about data in the LLM. Thus, the application may execute on a LLM multi-core processor as if the LLM multi-core processor has an unlimited heap space.07-31-2014
20140223145Configurable Reduced Instruction Set Core - A processor may be built with cores that only execute some partial set of the instructions needed to be fully backwards compliant. Thus, in some embodiments power consumption may be reduced by providing partial cores that only execute certain instructions and not other instructions. The instructions not supported may be handled in other, more energy efficient ways, so that, the overall processor, including the partial core, may be fully backwards compliant.08-07-2014
20140244979Estimating Time Remaining for an Operation - Techniques for estimating time remaining for an operation are described. Examples operations include file operations, such as file move operations, file copy operations, and so on. A wide variety of different operations may be considered in accordance with the claimed embodiments, further examples of which are discussed below. In at least some embodiments, estimating a time remaining for an operation can be based on a state of the operation. A state of an operation, for example, can be based on events related to the operation itself, such as the operation being initiated, paused, resumed, and so on. A state of an operation can also be based on events related to other operations.08-28-2014
20140244980METHOD AND SYSTEM FOR DYNAMIC CONTROL OF A MULTI-TIER PROCESSING SYSTEM - Method, system, and programs for dynamic control of a processing system having a plurality of tiers. Queue lengths of a plurality of nodes in one of the plurality of tiers are received. A control objective is received from a higher tier. One or more requests from the higher tier are processed by the plurality of nodes in the tier. A control model of the tier is computed based on the received queue lengths. One or more parameters of the control model are adjusted based on the received control objective. At least one control action is determined based on the control model and the control objective.08-28-2014
20140244981PROCESSOR AND CONTROL METHOD FOR PROCESSOR - A processor includes a programmable logic circuit provided with a plurality of processing units. The programmable logic circuit is capable of reconfiguring a first logic circuit corresponding to first circuit configuration information according to a first process and a second logic circuit corresponding to second circuit configuration information according to a second process. Each of the first and second logic circuits includes an information holding unit. A first control circuit stores the second circuit configuration information in the information holding unit of the first logic circuit and generates an execution control signal for executing the first process. A second control circuit obtains the second circuit configuration information from the information holding unit of the first logic circuit in response to completion of the first process and controls the programmable logic circuit so as to reconfigure the second logic circuit corresponding to the second circuit configuration information.08-28-2014
20140258688BENCHMARK GENERATION USING INSTRUCTION EXECUTION INFORMATION - Methods and systems are provided for generating a benchmark representative of a reference process. One method involves obtaining execution information for a subset of the plurality of instructions of the reference process from a pipeline of a processing module during execution of those instructions by the processing module, determining performance characteristics quantifying the execution behavior of the reference process based on the execution information, and generating the benchmark process that mimics the quantified execution behavior of the reference process based on the performance characteristics.09-11-2014
20140281417SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS PROVIDING A DATA UNIT SEQUENCING QUEUE - A system for passing data, the system including multiple data producers passing processed data, wherein the processed data include discrete data units that are each consecutively numbered, each of the data producers calculating insertion indices for ones of the data units passing therethrough; a circular buffer receiving the data units from the producers, the data units placed in slots that correspond to the respective insertion indices; and a consumer of the data units that receives the data units from the circular buffer in an order that preserves sequential numbering of the data units, wherein the multiple data producers follow a protocol so that a first one of the data producers, upon failing to place a first data unit in the circular buffer, does not lock other data producers from placing other data units in the circular buffer.09-18-2014
20140281418Multiple Data Element-To-Multiple Data Element Comparison Processors, Methods, Systems, and Instructions - An apparatus includes packed data registers and an execution unit. An instruction is to indicate a first source packed data that is to include a first packed data elements, a second source packed data that is to include a second packed data elements, and a destination storage location. The execution unit, in response to the instruction, is to store a packed data result that is to include packed result data elements in the destination storage location. Each of the result data elements is to correspond to a different one of the data elements of the second source packed data. Each of the result data elements is to include a multiple bit comparison mask that is to include a different comparison mask bit for each different corresponding data element of the first source packed data compared with the corresponding data element of the second source packed data.09-18-2014
20140325189QUERY SAMPLING INFORMATION INSTRUCTION - A measurement sampling facility takes snapshots of the central processing unit (CPU) on which it is executing at specified sampling intervals to collect data relating to tasks executing on the CPU. The collected data is stored in a buffer, and at selected times, an interrupt is provided to remove data from the buffer to enable reuse thereof. The interrupt is not taken after each sample, but in sufficient time to remove the data and minimize data loss.10-30-2014
20140331030STATICALLY SPECULATIVE COMPILATION AND EXECUTION - A system, for use with a compiler architecture framework, includes performing a statically speculative compilation process to extract and use speculative static information, encoding the speculative static information in an instruction set architecture of a processor, and executing a compiled computer program using the speculative static information, wherein executing supports static speculation driven mechanisms and controls.11-06-2014
20140365748METHOD, APPARATUS AND SYSTEM FOR DATA STREAM PROCESSING WITH A PROGRAMMABLE ACCELERATOR - Techniques and mechanisms for programming an accelerator device to enable performance of a data processing algorithm. In an embodiment, an accelerator of a computer platform is programmed based on programming information received from a host processor of the computer platform. In another embodiment, programming of the accelerator is to enable data driven execution of an instruction by a data stream processing engine of the accelerator.12-11-2014
20140365749USING A SINGLE TABLE TO STORE SPECULATIVE RESULTS AND ARCHITECTURAL RESULTS - Some implementations provide techniques and arrangements that include a physical register file to store a speculative result of executing a operation and to store an architectural result after the operation is retired and a rename alias table to store a speculative result pointer to the speculative result stored in the physical register file, an architectural result pointer to the architectural result stored in the physical register file, and a result selection field to indicate whether to select the speculative result pointer or the architectural result pointer.12-11-2014
20140380025MANAGEMENT OF HARDWARE ACCELERATOR CONFIGURATIONS IN A PROCESSOR CHIP - Techniques described herein generally include methods for the management of hardware accelerator images in a processor chip that includes one or more programmable logic circuits. Hardware accelerator images may be optimized by swapping out which hardware accelerator images are implemented in the one or more programmable logic circuits. The hardware accelerator images may be chosen from a library of accelerator programs downloaded to a device associated with the processor chip. Furthermore, the specific hardware accelerator images that are implemented in the one or more programmable logic circuits at a particular time may be selected based on which combination of accelerator images best enhances performance and power usage of the processor chip.12-25-2014
20150019844SYNTHETIC PROCESSING DIVERSITY WITHIN A HOMOGENEOUS PROCESSING ENVIRONMENT - A method of increasing processing diversity on a computer system includes: loading a plurality of instruction streams, each of the plurality of instruction streams being equivalent; executing, in a context, a first stream of the plurality of instruction streams; stopping execution of the first stream at a first location of the first stream; and executing, in the context, a second stream of the plurality of instruction streams at a second location of the second stream, the second location corresponding to the first location of the first stream.01-15-2015
20150019845Method to Extend the Number of Constant Bits Embedded in an Instruction Set - The invention allows a processor to maintain a fixed instruction width regardless of the width of the constants needed. The constant extension solves the problem of having variable length opcodes to accommodate longer constants. The invention allows the architecture to have a fixed width, regardless of the width of the constants specified, which simplify instruction decoding. Constant widths can be variable and extend beyond the fixed processor instruction width.01-15-2015
20150039863METHOD FOR COMPRESSING INSTRUCTION AND PROCESSOR FOR EXECUTING COMPRESSED INSTRUCTION - A method for compressing instruction is provided, which includes the following steps. Analyze a program code to be executed by a processor to find one or more instruction groups in the program code according to a preset condition. Each of the instruction groups includes one or more instructions in sequential order. Sort the one or more instruction groups according to a cost function of each of the one or more instruction groups. Put the first X of the sorted one or more instruction groups into an instruction table. X is a value determined according to the cost function. Replace each of the one or more instruction groups in the program code that are put into the instruction table with a corresponding execution-on-instruction-table (EIT) instruction.02-05-2015
20150039864SYSTEMS AND METHODS FOR DEFEATING MALWARE WITH RANDOMIZED OPCODE VALUES - A computer processor includes a first instruction set and a second instruction set. The computer processor further includes a translator. The translator translates the first instruction set into the second instruction set. The computer processor is configured to execute operations using only the second complete instruction set.02-05-2015
20150046684TECHNIQUE FOR GROUPING INSTRUCTIONS INTO INDEPENDENT STRANDS - A device compiler and linker is configured to group instructions into different strands for execution by different threads based on the dependence of those instructions on other, long-latency instructions. A thread may execute a strand that includes long-latency instructions, and then hardware resources previously allocated for the execution of that thread may be de-allocated from the thread and re-allocated to another thread. The other thread may then execute another strand while the long-latency instructions are in flight. With this approach, the other thread is not required to wait for the long-latency instructions to complete before acquiring hardware resources and initiating execution of the other strand, thereby eliminating at least a portion of the time that the other thread would otherwise spend waiting.02-12-2015
20150046685Intelligent Multicore Control For Optimal Performance Per Watt - The various aspects provide for a device and methods for intelligent multicore control of a plurality of processor cores of a multicore integrated circuit. The aspects may identify and activate an optimal set of processor cores to achieve the lowest level power consumption for a given workload or the highest performance for a given power budget. The optimal set of processor cores may be the number of active processor cores or a designation of specific active processor cores. When a temperature reading of the processor cores is below a threshold, a set of processor cores may be selected to provide the lowest power consumption for the given workload. When the temperature reading of the processor cores is above the threshold, a set processor cores may be selected to provide the best performance for a given power budget.02-12-2015
20150046686METHOD FOR EXECUTING BLOCKS OF INSTRUCTIONS USING A MICROPROCESSOR ARCHITECTURE HAVING A REGISTER VIEW, SOURCE VIEW, INSTRUCTION VIEW, AND A PLURALITY OF REGISTER TEMPLATES - A method for executing blocks of instructions using a microprocessor architecture having a register view, source view, instruction view, and a plurality of register templates. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks; using a plurality of register templates to track instruction destinations and instruction sources by populating the register template with block numbers corresponding to the instruction blocks, wherein the block numbers corresponding to the instruction blocks indicate interdependencies among the blocks of instructions; using a register view data structure, wherein the register view data structure stores destinations corresponding to the instruction blocks; using a source view data structure, wherein the source view data structure stores sources corresponding to the instruction blocks; and using an instruction view data structure, wherein the instruction view data structure stores instructions corresponding to the instruction blocks.02-12-2015
20150058602PROCESSOR WITH ADAPTIVE PIPELINE LENGTH - A system and method for reducing pipeline latency. In one embodiment, a processing system includes a processing pipeline. The processing pipeline includes a plurality of processing stages. Each stage is configured to further processing provided by a previous stage. A first of the stages is configured to perform a first function in a pipeline cycle. A second of the stages is disposed downstream of the first of the stages, and is configured to perform, in a pipeline cycle, a second function that is different from the first function. The first of the stages is further configured to selectably perform the first function and the second function in a pipeline cycle, and bypass the second of the stages.02-26-2015
20150067303INPUT DATA AGGREGATION PROCESSING APPARATUS, SYSTEM AND METHOD - To improve usability and a processing speed by waiting for a string of information associated with each other, and appropriately aggregating data satisfying a specific condition when outputting the data, and retaining and storing the aggregated data. In a case where a string of information originally associated with each other is arrival, separately, a data input/output unit determines a processing pattern on the basis of information on an input source. A data processing unit implements a processing condition for waiting for the associated information, individually, controls the storage of data in conformity with the processing condition, and aggregates and retains the string of data, and outputs an aggregation result. Also, the data processing unit can delete the string of unnecessary data in a lump.03-05-2015
20150074381PROCESSOR WITH MEMORY-EMBEDDED PIPELINE FOR TABLE-DRIVEN COMPUTATION - A processor and a method implemented by the processor to obtain computation results are described. The processor includes a unified reuse table embedded in a processor pipeline, the unified reuse table including a plurality of entries, each entry of the plurality of entries corresponding with a computation instruction or a set of computation instructions. The processor also includes a functional unit to perform a computation based on a corresponding instruction.03-12-2015
20150074382System and Method for an Asynchronous Processor with Scheduled Token Passing - Embodiments are provided for adding a token jump logic to an asynchronous processor with token passing. The token jump logic allows token forward jumps and token backward jumps over a cascade of token processing logics in the processor. An embodiment method includes determining, using a token jump logic coupled to a cascade of token processing logics, whether to administer a token forward jump or a token backward jump of a token signal passing through the token processing logics. The token forward jump and token backward jump allow the token signal to skip one or more token processing logics in the cascade. The method further includes monitoring, for each of the token processing logics, a polarity status of a token sense logic, and inverting the polarity status according to the determination at the token jump logic.03-12-2015
20150089203LATENCY REDUCTION IN DISTRIBUTED COMPUTING ENVIRONMENTS - Systems and methods that facilitate anticipatory execution and provision of possible instructions receivable by a computing environment are described herein. A state management component identifies the possible instructions, and defines a prediction space. A predictor component selects one or more possible instructions to be executed by a processor module. In one embodiment, a portion of a computing environment is hosted at a data center, and a user interacts with the computing environment via a terminal computing device. A possible instruction receivable by the terminal is identified and executed before such instruction is issued, and resulting information is preemptively delivered to the terminal computing device, which accesses the information when such instruction is issued.03-26-2015
20150095620ESTIMATING SCALABILITY OF A WORKLOAD - In an embodiment, a processor includes a first logic to calculate a scalability value for a processor domain based at least in part on an active state residency, a stall duration, and a memory bandwidth of the domain, and to determine an operating frequency update for the domain based at least in part on a current operating frequency of the domain and the scalability value. Other embodiments are described and claimed.04-02-2015
20150113253SELECTIVELY COMPRESSED MICROCODE - A microprocessor includes one or more memories configured to hold microcode instructions, wherein at least a portion of the microcode instructions are compressed. The microprocessor also includes a decompression unit configured to decompress the compressed microcode instructions after being fetched from the one or more memories and before being executed. A method includes receiving from a memory a first N-bit wide microcode word, determining whether or not a predetermined portion of the first N-bit wide microcode word is a predetermined value, if the predetermined portion is not the predetermined value, decompressing the first N-bit wide microcode word to generate an M-bit wide microcode word, and if the predetermined portion is the predetermined value, receiving from the memory a second N-bit wide microcode word and joining portions of the first and second N-bit wide microcode words to generate the M-bit wide microcode word.04-23-2015
20150134935Split Register File for Operands of Different Sizes - In an embodiment, a processor includes a register file having multiple widths corresponding to different operands sizes of a given data type implemented by the processor. For example, the integer register file may have 32 bit and 64 bit widths for 32 and 64 bit operand sizes. The register file may have a section of registers for each operand size, and the map unit may allocate registers from the appropriate section for each instruction operation based on the operand size of that instruction operation. The register file may consume less integrated circuit area than another register file having the same number of registers, all of which are implemented at the largest operand size. In some embodiments, only the register file and the map unit (specifically the free list management logic in the map unit) are changed to implement the multiple-width register file.05-14-2015
20150370565DATA PROCESSING DEVICE AND METHOD, AND PROCESSOR UNIT OF SAME - A processor unit (12-24-2015
20150378729COLLECTING TRANSACTIONAL EXECUTION CHARACTERISTICS DURING TRANSACTIONAL EXECUTION - Execution of a transaction may be initiated by a CPU in a transactional execution (TX) environment. A set of TX performance characteristics of the transaction during the transactional execution may be collected and stored in a location specified by an instruction of the transaction when the transactional execution ends or aborts.12-31-2015
20160092217Compare Break Instructions - In an embodiment, a processor may implement a vector instruction set including one or more compare break instructions. The compare break instruction may take a pair of operands which may be compared to determine loop termination conditions, and may output a predicate vector indicating which vector elements correspond to loop iterations that are executed and which vector elements correspond to loop iterations that are not executed. The predicate vector may serve as a predicate to vector instructions forming the body of the loop, correctly executing the specified number of iterations. The compare break instruction may be coded to check for a variety of conditions (e.g. equal, not equal, greater than, less than, etc.). In an embodiment, the compare break instruction may take a predicate operand as well, which may be combined with the predicate vector produced by the comparison operations to produce the output vector.03-31-2016
20160092235METHOD AND APPARATUS FOR IMPROVED THREAD SELECTION - An apparatus and method are described for improved thread selection. For example, one embodiment of a processor comprises: first logic to maintain a history table comprising a plurality of entries, each entry in the table associated with an instruction and including history data indicating prior hits and/or misses to a cache level and/or a translation lookaside buffer (TLB) for that instruction; and second logic to select a particular thread for execution at a particular processor pipeline stage based on the history data.03-31-2016
20160110203COMPUTER SYSTEM FOR NOTIFYING SIGNAL CHANGE EVENT THROUGH CACHE STASHING - A computer system includes a cache unit and a first processing unit. The first processing unit runs a first program thread, and performs an instruction to store information of a signal change event into the cache unit through a cache stashing operation, where the signal change event is initiated by the first program thread for alerting a second program thread.04-21-2016
20160139928TECHNIQUES FOR INSTRUCTION GROUP FORMATION FOR DECODE-TIME INSTRUCTION OPTIMIZATION BASED ON FEEDBACK - A technique of processing instructions for execution by a processor includes determining whether a first property of a first instruction and a second property of a second instruction are compatible. The first instruction and the second instruction are grouped in an instruction group in response to the first and second properties being compatible and a feedback value generated by a feedback function indicating the instruction group has been historically beneficial with respect to a benefit metric of the processor. Group formation for the first and second instructions is performed according to another criteria, in response to the first and second properties being incompatible or the feedback value indicating the grouping of the first and second instructions has not been historically beneficial.05-19-2016
20160162295METHOD AND APPARATUS FOR EXECUTING MULTI-THREAD - A method of executing, by a processor, a multi-thread including threads of the processor, includes setting a mask value indicating execution of one of the threads of the processor based on an instruction, setting an inverted mask value based on the set mask value; and executing the thread of the processor based on the set mask value and the set inverted mask value.06-09-2016
20160188326PROCESSOR WITH INSTRUCTION ITERATION - A processor includes a plurality of execution units. At least one of the execution units is configured to repeatedly execute a first instruction based on a first field of the first instruction indicating that the first instruction is to be iteratively executed.06-30-2016
20160202978INSTRUCTION TO LOAD DATA UP TO A SPECIFIED MEMORY BOUNDARY INDICATED BY THE INSTRUCTION07-14-2016
20160378477INSTRUCTIONS TO COUNT CONTIGUOUS REGISTER ELEMENTS HAVING SPECIFIC VALUES - A machine instruction to find a condition location within registers, such as vector registers. The machine instruction has associated therewith a register to be examined and a result location. The register includes a plurality of elements. In execution, the machine instruction counts a number of contiguous elements of the plurality of elements of the register having a particular value in a selected location within the contiguous elements. Other locations within the contiguous elements are ignored for the counting. The counting provides a count placed in the result location.12-29-2016
20160378478INSTRUCTIONS TO COUNT CONTIGUOUS REGISTER ELEMENTS HAVING SPECIFIC VALUES - A machine instruction to find a condition location within registers, such as vector registers. The machine instruction has associated therewith a register to be examined and a result location. The register includes a plurality of elements. In execution, the machine instruction counts a number of contiguous elements of the plurality of elements of the register having a particular value in a selected location within the contiguous elements. Other locations within the contiguous elements are ignored for the counting. The counting provides a count placed in the result location.12-29-2016
20160378489REGISTER FILE MAPPING - An apparatus for processing instructions includes a mapping unit comprising a plurality of mappers wherein each mapper of the plurality of mappers maps a logical sub-register reference to a physical sub-register reference, a decoding unit configured to receive an instruction and determine a plurality of logical sub-register references therefrom, and an execution unit. The mapping unit may be configured to distribute the plurality of logical sub-register references amongst the plurality of mappers according to at least one bit in the instruction and provide a corresponding plurality of physical sub-register references. The execution unit may be configured to execute the instruction using the plurality of physical sub-register references. Corresponding methods are also disclosed herein.12-29-2016
20190146796UNIFORM REGISTER FILE FOR IMPROVED RESOURCE UTILIZATION05-16-2019
20220137978METHOD AND APPARATUS FOR STATELESS PARALLEL PROCESSING OF TASKS AND WORKFLOWS - In a method for parallel processing of a data stream, a processing task is received to process the data stream that includes a plurality of segments. A split operation is performed on the data stream to split the plurality of segments into N sub-streams. Each of the N sub-streams includes one or more segments of the plurality of segments. The N is a positive integer. N sub-processing tasks are performed on the N sub-streams to generate N processed sub-streams. A merge operation is performed on the N processed sub-streams based on a merge buffer to generate a merged output data stream. The merge buffer includes an output iFIFO buffer and N sub-output iFIFO buffers coupled to the output iFIFO buffer. The merged output data stream is identical to an output data stream that is generated when the processing task is applied directly to the data stream without the split operation.05-05-2022

Patent applications in class PROCESSING CONTROL

Patent applications in all subclasses PROCESSING CONTROL

Website © 2025 Advameg, Inc.