Entries |
Document | Title | Date |
20080201562 | DATA PROCESSING SYSTEM - The present invention provides a data processor or a data processing system which can be used in compatible modes among which the number of bits of an address specifying a logical address space varies at the time of referring to a branch address table by extension of displacement of a branch instruction. At the time of generating a branch address of a first branch instruction, the data processor or the data processing system optimizes a multiple with which a displacement is multiplied in accordance with the number of bits of an address specifying a logical address space, adds extended address information to the value of a register, and refers to a branch address table with address information obtained by the addition. The referred information is used as a branch address. To be adapted to a compatible mode using different number of bits of an address specifying a logical address space, it is sufficient to change a multiple with which the displacement is multiplied in accordance with the mode. | 08-21-2008 |
20080201563 | Apparatus for Improving Single Thread Performance through Speculative Processing - An apparatus is provided for using multiple thread contexts to improve processing performance of a single thread. When an exceptional instruction is encountered, the exceptional instruction and any predicted instructions are reloaded into a buffer of a first thread context. A state of the register file at the time of encountering the exceptional instruction is maintained in a register file of the first thread context. The instructions in the pipeline are executed speculatively using a second register file in a second thread context. During speculative execution, cache misses may cause loading of data to the cache may be performed. Results of the speculative execution are written to the second register file. When a stopping condition is met, contents of the first register file are copied to the second register file and the reloaded instructions are released to the execution pipeline. | 08-21-2008 |
20080209188 | PROCESSOR AND METHOD OF PERFORMING SPECULATIVE LOAD OPERATIONS OF THE PROCESSOR - Provided is a processor and method of performing speculative load instructions of the processor in which a load instruction is performed only in the case where the load instruction substantially accesses a memory. A load instruction for canceling operations is performed in other cases except the above case, so that problems occurring by accessing an input/output (I/O) mapped memory area and the like at the time of performing speculative load instructions can be prevented using only a software-like method, thereby improving the performance of a processor. | 08-28-2008 |
20080209189 | Information processing apparatus - The present invention provides an information processing apparatus having a predecoder decoding an operation code in an input instruction, generating conditional branch instruction information indicating that the input instruction is a conditional branch instruction and instruction type information indicating a type of the conditional branch instruction when the input instruction is a conditional branch instruction, and writing the input instruction, from which the operation code is deleted, the conditional branch instruction information and the instruction type information to the instruction cache memory, and a history information writing unit writing history information indicating whether or not the conditional branch instruction was branched, as a result of executing the conditional branch instruction stored in the instruction cache memory, to an area in the instruction cache memory, where the operation code of the conditional branch instruction is deleted. | 08-28-2008 |
20080235499 | APPARATUS AND METHOD FOR INFORMATION PROCESSING ENABLING FAST ACCESS TO PROGRAM - A main memory stores cache blocks obtained by dividing a program. At a position in a cache block where a branch to another cache block is provided, there is embedded an instruction for activating a branch resolution routine for performing processing, such as loading of a cache block of the branch target. A program is loaded into a local memory in units of cache blocks, and the cache blocks are serially stored in first through nth banks, which are sections provided in the storage area. Management of addresses in the local memory or processing for discarding a copy of a cache block is performed with reference to an address translation table, an inter-bank reference table and a generation number table. | 09-25-2008 |
20080250234 | Microprocessor Ouput Ports and Control of Instructions Provided Therefrom - A method and apparatus are provided for controlling instructions provided by a microprocessor output port to other execution units. A microprocessor pipeline of instructions is provided for each execution unit. These are scheduled via the microprocessor unit for each execution unit, a determination is made as to whether or not the execution unit can receive further instructions. If it cannot, it's associated pipeline is said to be stalled and instructions are deleted from the microprocessor pipeline. Its thread can then be restarted at a later time with the instruction corresponding to the instruction which was unable to execute. | 10-09-2008 |
20080276079 | MECHANISM TO MINIMIZE UNSCHEDULED D-CACHE MISS PIPELINE STALLS - A method and apparatus for minimizing unscheduled D-cache miss pipeline stalls is provided. In one embodiment, execution of an instruction in a processor is scheduled. The processor may have at least one cascaded delayed execution pipeline unit having two or more execution pipelines that execute instructions in a common issue group in a delayed manner relative to each other. The method includes receiving an issue group of instructions, determining if a first instruction in the issue group is a load instruction, and if so, scheduling the first instruction to be executed in a pipeline in which execution is not delayed with respect to another pipeline in the cascaded delayed execution pipeline unit. | 11-06-2008 |
20080313443 | PROCESSOR APPARATUS - There is disclosed a processing apparatus including, as an instruction set, a complex conditional branch instruction, and a condition setting instruction. The complex conditional branch instruction is an instruction for performing comparison operation for one or each of a plural number of conditions, and for performing branching to a branch target specified, based on comparison operation between the results of the comparison operations performed and the branching condition value specified. The condition setting instruction is an instruction for setting the condition. The processing apparatus includes a complex condition setting storage unit for storing the complex condition specified by the condition setting instruction, a condition comparison unit including a plurality of comparators for comparing the complex condition specified by the complex conditional branch instruction, in the complex condition setting storage unit, at the time of execution of the complex conditional branch instruction, a complex condition branching decision unit for determining on whether or not branching to the branch target is to be performed, using the results of comparisons performed in the comparators of the condition comparison unit and the branching condition value specified by the complex conditional branch instruction. | 12-18-2008 |
20080313444 | MICROCOMPUTER AND DIVIDING CIRCUIT - Herein disclosed is a microcomputer MCU adopting the general purpose register method. The microcomputer is enabled to have a small program capacity or a high program memory using efficiency and a low system cost, while enjoying the advantage of simplification of the instruction decoding as in the RISC machine having a fixed length instruction format of the prior art, by adopting a fixed length instruction format having a power of 2 but a smaller bit number than that of the maximum data word length fed to instruction execution means. And, the control of the coded division is executed by noting the code bits. | 12-18-2008 |
20090019272 | STORE QUEUE ARCHITECTURE FOR A PROCESSOR THAT SUPPORTS SPECULATIVE EXECUTION - Embodiments of the present invention provide a system that buffers stores on a processor that supports speculative execution. The system starts by buffering a store into an entry in the store queue during a speculative execution mode. If an entry for the store does not already exist in the store queue, the system writes the store into an available entry in the store queue and updates a byte mask for the entry. Otherwise, if an entry for the store already exists in the store queue, the system merges the store into the existing entry in the store queue and updates the byte mask for the entry to include information about the newly merged store. The system then forwards the data from the store queue to subsequent dependent loads. | 01-15-2009 |
20090083526 | Program conversion apparatus, program conversion method, and comuter product - A linker generates a simulator-use executable format program from a pre-conversion object program and a simulator-use object program. A simulator executes the simulator-use object program and acquires branch trace information. A binary program converting tool, based on the branch trace information and a branch penalty table, generates a post-conversion object program having a rewritten branching prediction bit of the pre-conversion object program. Another linker generates an actual-machine-use executable format program from the post-conversion object program and an actual-machine-use object program. | 03-26-2009 |
20090138689 | Partitioning Processor Resources Based on Memory Usage - Processor resources are partitioned based on memory usage. A compiler determines the extent to which a process is memory-bound and accordingly divides the process into a number of threads. When a first thread encounters a prolonged instruction, the compiler inserts a conditional branch to a second thread. When the second thread encounters a prolonged instruction, a conditional branch to a third thread is executed. This continues until the last thread conditionally branches back to the first thread. An indirect segmented register file is used so that the “return to” and “branch to” logical registers within each thread are the same (e.g., R | 05-28-2009 |
20090172370 | EAGER EXECUTION IN A PROCESSING PIPELINE HAVING MULTIPLE INTEGER EXECUTION UNITS - One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage. | 07-02-2009 |
20090177874 | Processor apparatus and conditional branch processing method - Disclosed is a processor apparatus including a branch condition storage unit having a plurality of storage regions in each of which a branch condition set by a condition setting instruction is stored, an instruction decoder that decodes an instruction code, an instruction memory that stores therein the instruction code, an operation register used by a processor for operation, a branch condition comparison unit that performs a comparison operation for each of branch conditions, a conditional branch determination unit that makes a determination whether or not to perform program branching in a conditional branch instruction, a selector that makes selection between a branch destination address and a next instruction address, based on an output value of the condition branch determination unit, and a program counter that indicates a processor instruction executing position. The branch condition specified by the condition setting instruction is stored in one of the storage regions in the branch condition storage unit 1 specified by the condition setting instruction. When the conditional branch instruction is executed, individual determinations on a plurality of the branch conditions stored in the branch condition storage unit are made. Among the branch conditions that simultaneously hold, the branch address corresponding to the branch condition stored in a predetermined one of the storage regions in the branch condition storage unit is selected from the branch address storage unit, and branching to the branch address is performed. | 07-09-2009 |
20090193239 | COUNTER CONTROL CIRCUIT, DYNAMIC RECONFIGURABLE CIRCUIT, AND LOOP PROCESSING CONTROL METHOD - A counter control circuit that controls the operation of a counter arranged in a dynamic reconfigurable circuit executing an arbitrary instruction by dynamically switching an aggregation of reconfigurable processing elements (hereinafter referred to as “PEs”) according to a context reciting a processing content of the PE and a connection content between the PEs, the counter control circuit including: keeping means for keeping an operation instruction signal when the PE executing a conditional branching computation outputs, in a context being adapted to the dynamic reconfigurable circuit, the operation instruction signal of the counter for a subsequent context; output means for outputting the operation instruction signal kept in the keeping means to the counter; and control means for causing the output means to output the operation instruction signal when the context being adapted to the dynamic reconfigurable circuit is switched to the subsequent context. | 07-30-2009 |
20090240931 | Indirect Function Call Instructions in a Synchronous Parallel Thread Processor - An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches. | 09-24-2009 |
20090300339 | LSI FOR IC CARD - To prevent exposure or tampering of data by an illegal access to a memory of an LSI, a ROM ( | 12-03-2009 |
20090327672 | SECURED PROCESSING UNIT - A method for executing by a processing unit a program stored in a memory, includes: detecting a piece of information during the execution of the program by the processing unit, and if the information is detected, triggering the execution of a hidden subprogram by the processing unit. The method may be applied to the securization of an integrated circuit. | 12-31-2009 |
20100082952 | PROCESSOR - When two threads (strands), for example, are executed in parallel in a processor in a simultaneous multi-thread (SMT) system, entries of a branch reservation station of an instruction control device are separately used in a strand | 04-01-2010 |
20100095103 | Instruction execution control device and instruction execution control method - An instruction execution control device operates a plurality of threads in a simultaneous multi-thread system. The device has architecture registers ( | 04-15-2010 |
20100115251 | METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR OPTIMIZING RUNTIME BRANCH SELECTION IN A FLOW PROCESS - A method, system, and computer program product for optimizing runtime branch selection in a flow process are provided. The method includes gathering performance metrics of flow branch behavior for executed flows in a runtime system over time and using aggregated performance metrics for the behavior to determine an optimal ordering of branches for a currently running flow. The optimal ordering is determined by identifying one or more branch points in the flow, generating ordering permutations for at least a portion of the branches in the branch point for the flow to identify any permutations that have not been executed, gathering metrics for permutation(s) of the branch point in the flow, comparing the metrics to performance metrics of executed flows having substantially similar flow branch behavior, and identifying optimal branch ordering for the permutation(s) based upon the comparison. The method also includes executing the flow according to the optimal branch ordering. | 05-06-2010 |
20100161949 | SYSTEM AND METHOD FOR FAST BRANCHING USING A PROGRAMMABLE BRANCH TABLE - Methods and systems consistent with the present invention provide a programmable table which allows software to define a plurality of branching functions, each of which maps a vector of condition codes to a branch offset. This technique allows for a flexible multi-way branching functionality, using a conditional branch outcome table that can be specified by a programmer. Any instruction can specify the evaluation of arbitrary conditional expressions to compute the values for the condition codes, and can choose a particular branching function. When the processor executes the instruction, the processor's arithmetic/logical functional units evaluate the conditional expressions and then the processor performs the branch operation, according to the specified branching function. | 06-24-2010 |
20100161950 | SEMI-ABSOLUTE BRANCH INSTRUCTIONS FOR EFFICIENT COMPUTERS - Apparatus and methods are disclosed for a computation processor that can execute a semi-absolute branch instruction, as well as methods of operation and of generating the semi-absolute branch instruction. | 06-24-2010 |
20100205415 | PIPELINED MICROPROCESSOR WITH FAST CONDITIONAL BRANCH INSTRUCTIONS BASED ON STATIC SERIALIZING INSTRUCTION STATE - A microprocessor includes a control register that stores a control value that affects operation of the microprocessor. An instruction set architecture includes a conditional branch instruction that specifies a branch condition based on the control value stored in the control register, and a serializing instruction that updates the control value in the control register. The microprocessor completes all modifications to flags, registers, and memory by instructions previous to the serializing instruction and to drain all buffered writes to memory before it fetches and executes the next instruction after the serializing instruction. Execution units update the control value in the control register in response to the serializing instruction. A fetch unit fetches, decodes, and unconditionally correctly resolves and retires the conditional branch instruction based on the control value stored in the control register rather than dispatching the conditional branch instruction to the execution units to be resolved. | 08-12-2010 |
20100217961 | PROCESSOR SYSTEM EXECUTING PIPELINE PROCESSING AND PIPELINE PROCESSING METHOD - A processor system includes a plurality of pipeline stages, a controller, and a transfer path. The plurality of pipeline stages is subjected to processing. The controller determines whether or not each of the executable instructions to be processed in the pipeline stages requires processing in a succeeding pipeline stage. The transfer path, if the controller determines the executable instruction does not require the processing in the succeeding pipeline stage, skips the pipeline stage including the unnecessary processing. | 08-26-2010 |
20100250906 | Obfuscation - In an embodiment of a method of making a conditional jump in a computer running a program, an input is provided, conditional on which a substantive conditional branch is to be made. An obfuscatory unpredictable datum is provided. Code is executed that causes an obfuscatory branch conditional on the unpredictable datum. At a point in the computer program determined by the obfuscatory conditional branch, a substantive branch is made that is conditional on the input. | 09-30-2010 |
20100281240 | Program Code Simulator - A system and method for facilitating simulation of a computer program. A program representation is generated from a computer program. A simulation of the program is performed. Simulation may include applying heuristics to determine program flow for selected instructions, such as a branch instruction or a loop instruction. Simulation may also include creating imaginary objects as surrogates for real objects, when program code to create real objects is restricted, or fields of the objects are unavailable or uncertain, or for other reasons. Data descriptive of the simulation is inserted into the program representation. A visualizer may retrieve the program representation and generate a visualization that shows sequence flows resulting from the simulation. | 11-04-2010 |
20100281241 | METHOD AND SYSTEM FOR SYNCHRONIZING INCLUSIVE DECISION BRANCHES - A method and system for reconstructing an original process model that includes an inclusive decision into a reconstructed process model that includes synchronized branches. The inclusive decision is replaced with a fork. Each original path of the inclusive decision is reconstructed into a corresponding reconstructed path. Each reconstructed path includes a decision, an activity included in the corresponding original path, and a merge. The decision includes a first output connected to the activity, whose output is connected to a first input of the merge. The decision further includes a second output connected to a second input of the merge. A join is created to recombine output branches from the merge node and other merge nodes included in reconstructed paths. The output branches are synchronized so that all activated reconstructed paths must finish processing before a modeling element connected to an output of the join is executed. | 11-04-2010 |
20100299508 | DYNAMICALLY ALLOCATED STORE QUEUE FOR A MULTITHREADED PROCESSOR - Systems and methods for storage of writes to memory corresponding to multiple threads. A processor comprises a store queue, wherein the queue dynamically allocates a current entry for a committed store instruction in which entries of the array may be allocated out of program order. For a given thread, the store queue conveys store data to a memory in program order. The queue is further configured to identify an entry of the plurality of entries that corresponds to an oldest committed store instruction for a given thread and determine a next entry of the array that corresponds to a next committed store instruction in program order following the oldest committed store instruction of the given thread, wherein said next entry includes data identifying the entry. The queue marks an entry as unfilled upon successful conveying of store data to the memory. | 11-25-2010 |
20100318774 | PROCESSOR INSTRUCTION GRADUATION TIMEOUT - A multiprocessor computer system comprises a plurality of processors distributed across a plurality of node coupled by a processor interconnect network. One or more of the processors is operable to manage hung processor instructions by setting a graduation timeout counter after a first program instruction graduates, resetting the graduation timeout counter if a subsequent program instruction graduates before the graduation timeout counter expires, and resetting the processor if the graduation timeout counter expires before the subsequent program instruction graduates. | 12-16-2010 |
20100318775 | Methods and Apparatus for Adapting Pipeline Stage Latency Based on Instruction Type - Processor pipeline controlling techniques are described which take advantage of the variation in critical path lengths of different instructions to achieve increased performance. By examining a processor's instruction set and execution unit implementation's critical timing paths, instructions are classified into speed classes. Based on these speed classes, one pipeline is presented where hold signals are used to dynamically control the pipeline based on the instruction class in execution. An alternative pipeline supporting multiple classes of instructions is presented where the pipeline clocking is dynamically changed as a result of decoded instruction class signals. A single pass synthesis methodology for multi-class execution stage logic is also described. For dynamic class variable pipeline processors, the mix of instructions can have a great effect on processor performance and power utilization since both can vary by the program mix of instruction classes. Application code can be given new degrees of optimization freedom where instruction class and the mix of instructions can be chosen based on performance and power requirements. | 12-16-2010 |
20110072249 | UNANIMOUS BRANCH INSTRUCTIONS IN A PARALLEL THREAD PROCESSOR - One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch. | 03-24-2011 |
20110072250 | Methods and Apparatus for Scalable Array Processor Interrupt Detection and Response - Hardware and software techniques for interrupt detection and response in a scalable pipelined array processor environment are described. Utilizing these techniques, a sequential program execution model with interrupts can be maintained in a highly parallel scalable pipelined array processing containing multiple processing elements and distributed memories and register files. When an interrupt occurs, interface signals are provided to all PEs to support independent interrupt operations in each PE dependent upon the local PE instruction sequence prior to the interrupt. Processing/element exception interrupts are supported and low latency interrupt processing is also provided for embedded systems where real time signal processing is required. Further, a hierarchical interrupt structure is used allowing a generalized debug approach using debut interrupts and a dynamic debut monitor mechanism. | 03-24-2011 |
20110078424 | OPTIMIZING PROGRAM CODE USING BRANCH ELIMINATION - A method for optimizing program code is provided. The method comprises detecting a branch instruction comprising a condition expression, wherein the branch instruction, when executed by a processor, causes the processor to execute either a first set of instructions or a second set of instructions according to a value of the condition expression; and replacing the branch instruction with a third set of instructions that are non-branching, wherein the third set of instructions, when executed by a processor, has a collective effect same as if either the first or second set of instructions were executed according to the value of the condition expression. The third set of instructions comprises a negation instruction to normalize the value of the condition expression. | 03-31-2011 |
20110154002 | Methods And Apparatuses For Efficient Load Processing Using Buffers - Various embodiments of the invention concern methods and apparatuses for power and time efficient load handling. A compiler may identify producer loads, consumer reuse loads, consumer forwarded loads, and producer/consumer hybrid loads. Based on this identification, performance of the load may be efficiently directed to a load value buffer, store buffer, data cache, or elsewhere. Consequently, accesses to cache are reduced, through direct loading from load value buffers and store buffers, thereby efficiently processing the loads. | 06-23-2011 |
20110161641 | SPE Software Instruction Cache - An application thread executes a direct branch instruction that is stored in an instruction cache line. Upon execution, the direct branch instruction branches to a branch descriptor that is also stored in the instruction cache line. The branch descriptor includes a trampoline branch instruction and a target instruction space address. Next, the trampoline branch instruction sends a branch descriptor pointer, which points to the branch descriptor, to an instruction cache manager. The instruction cache manager extracts the target instruction space address from the branch descriptor, and executes a target instruction corresponding to the target instruction space address. In one embodiment, the instruction cache manager generates a target local store address by masking off a portion of bits included in the target instruction space address. In turn, the application thread executes the target instruction located at the target local store address accordingly. | 06-30-2011 |
20110197049 | TWO PASS TEST CASE GENERATION USING SELF-MODIFYING INSTRUCTION REPLACEMENT - A test code generation technique that replaces instructions having a machine state dependent result with special redirection instructions provides generation of test code in which state dependent execution choices are made without a state model. Redirection instructions cause execution of a handler than examines the machine state and replaces the redirection instruction with a replacement instruction having a desired result resolved in accordance with the current machine state. The instructions that are replaced may be conditional branch instructions and the result a desired execution path. The examination of the machine state permits determination of a branch condition for the replacement instruction so that the next pass of the test code executes along the desired path. Alternatively, the handler can execute a jump to the branch instruction, causing immediate execution of the desired branch path. The re-direction instructions may be illegal instructions, which cause execution of an interrupt handler that performs the replacement. | 08-11-2011 |
20110197050 | Wait Instruction - A microprogrammable electronic device comprises a code memory storing a plurality of instructions. At least one instruction, when executed by the device, causes the device to enter into a wait state associated with a plurality of predefined wait state exit conditions. The device is configured to load into an electronic table each condition together with a corresponding code memory address of an instruction to be executed when the condition occurs; to execute, when is in the wait state, a wait instruction stored in the code memory and which, when executed, is such as to cause the device to check simultaneously the conditions loaded into said electronic table to detect if condition occurs; and, if a condition occurs, to exit from said wait state and to execute the instruction stored in the code memory at the code memory address loaded into the electronic table together with the condition that occurred. | 08-11-2011 |
20110238964 | DATA PROCESSOR - The data processor includes CPU operable to execute an instruction included in an instruction set. The instruction set includes a load instruction for reading data on a memory space. The data read according to the load instruction includes data of a format type having a data-read-branching-occurrence bit region. The CPU includes a data-read-branching control register; a data-read-branching address register; and a read-data-analyzing unit. On condition that a bit value showing the occurrence of data read branching has been set on the data-read-branching-occurrence bit region, and a value showing the data-read-branching-occurrence bit remaining valid has been set on the data-read-branching control register, the switching between processes is performed by branching to an address stored in the data-read-branching address register. | 09-29-2011 |
20110276792 | RESOURCE FLOW COMPUTER - A scalable processing system includes a memory device having a plurality of executable program instructions, wherein each of the executable program instructions includes a timetag data field indicative of the nominal sequential order of the associated executable program instructions. The system also includes a plurality of processing elements, which are configured and arranged to receive executable program instructions from the memory device, wherein each of the processing elements executes executable instructions having the highest priority as indicated by the state of the timetag data field. | 11-10-2011 |
20110307689 | PROCESSOR SUPPORT FOR HARDWARE TRANSACTIONAL MEMORY - A processing core of a plurality of processing cores is configured to execute a speculative region of code as a single atomic memory transaction with respect one or more others of the plurality of processing cores. In response to determining an abort condition for an issued one of the plurality of program instructions and in response to determining that the issued program instruction is not part of a mispredicted execution path, the processing core is configured to abort an attempt to execute the speculative region of code. | 12-15-2011 |
20110314265 | PROCESSORS OPERABLE TO ALLOW FLEXIBLE INSTRUCTION ALIGNMENT - Methods and apparatus are provided for optimizing a processor core. Common processor subcircuitry is used to perform calculations for various types of instructions, including branch and non-branch instructions. Increasing the commonality of calculations across different instruction types allows branch instructions to jump to byte aligned memory address even if supported instructions are multi-byte or word aligned. | 12-22-2011 |
20110320787 | Indirect Branch Hint - A processor implements an apparatus and a method for predicting an indirect branch address. A target address generated by an instruction is automatically identified. A predicted next program address is prepared based on the target address before an indirect branch instruction utilizing the target address is speculatively executed. The apparatus suitably employs a register for holding an instruction memory address that is specified by a program as a predicted indirect address of an indirect branch instruction. The apparatus also employs a next program address selector that selects the predicted indirect address from the register as the next program address for use in speculatively executing the indirect branch instruction. | 12-29-2011 |
20110320788 | METHOD AND APPARATUS FOR BRANCH REDUCTION IN A MULTITHREADED PACKET PROCESSOR - A method and apparatus for branch reduction in a multithreaded packet processor is presented. An instruction is executed which includes testing of a branch flag. The branch flag references a configuration bit vector wherein each bit in the configuration bit vector corresponds to a respective feature. When said branch flag returns a first result processing is continues at an instruction located at a first location relative to a Program Counter (PC) and when the branch flag returns a second result processing is continued at a second location relative to said PC. | 12-29-2011 |
20120072707 | Scaleable Status Tracking Of Multiple Assist Hardware Threads - A processor includes an initiating hardware thread, which initiates a first assist hardware thread to execute a first code segment. Next, the initiating hardware thread sets an assist thread executing indicator in response to initiating the first assist hardware thread. The set assist thread executing indicator indicates whether assist hardware threads are executing. A second assist hardware thread initiates and begins executing a second code segment. In turn, the initiating hardware thread detects a change in the assist thread executing indicator, which signifies that both the first assist hardware thread and the second assist hardware thread terminated. As such, the initiating hardware thread evaluates assist hardware thread results in response to both of the assist hardware threads terminating. | 03-22-2012 |
20120102302 | PROCESSOR TESTING - Processors may be tested according to various implementations. In one general implementation, a process for processor testing may include randomly generating a first plurality of branch instructions for a first portion of an instruction set, each branch instruction in the first portion branching to a respective instruction in a second portion of the instruction set, the branching of the branch instructions to the respective instructions being arranged in a sequential manner. The process may also include randomly generating a second plurality of branch instructions for the second portion of the instruction set, each branch instruction in the second portion branching to a respective instruction in the first portion of the instruction set, the branching of the branch instructions to the respective instructions being arranged in a sequential manner. The process may additionally include generating a plurality of instructions to increment a counter when each branch instruction is encountered during execution. | 04-26-2012 |
20120311307 | Method And Apparatus For Obtaining A Call Stack To An Event Of Interest And Analyzing The Same - In one embodiment, a processor includes a performance monitor including a last branch record (LBR) stack to store a call stack to an event of interest, where the call stack is collected responsive to a trigger for the event. The processor further includes logic to control the LBR stack to operate in a call stack mode such that an entry to a call instruction for a leaf function is cleared on return from the leaf function. Other embodiments are described and claimed. | 12-06-2012 |
20120324209 | BRANCH TARGET BUFFER ADDRESSING IN A DATA PROCESSOR - A data processor includes a branch target buffer (BTB) having a plurality of BTB entries grouped in ways. The BTB entries in one of the ways include a short tag address and the BTB entries in another one of the ways include a full tag address. | 12-20-2012 |
20130054942 | TRACKING A PROGRAMS CALLING CONTEXT USING A HYBRID CODE SIGNATURE - A method for a hybrid code signature including executing, via a processor, an application, the executing comprising executing a root instruction of the application; profiling, via the processor, the executing of the application, the profiling comprising storing a reference signature; determining, via the processor, a working signature of instructions executed subsequent to the executing of the root instruction, the determining comprising implementing a hashing function of the instructions in response to storing the reference signature; tracking the updating of the working signature by storing a value in a counter; and updating continuously, via the processor, the working signature with the hashing function while at least the working signature does not match the reference signature. | 02-28-2013 |
20130061027 | TECHNIQUES FOR HANDLING DIVERGENT THREADS IN A MULTI-THREADED PROCESSING SYSTEM - This disclosure describes techniques for handling divergent thread conditions in a multi-threaded processing system. In some examples, a control flow unit may obtain a control flow instruction identified by a program counter value stored in a program counter register. The control flow instruction may include a target value indicative of a target program counter value for the control flow instruction. The control flow unit may select one of the target program counter value and a minimum resume counter value as a value to load into the program counter register. The minimum resume counter value may be indicative of a smallest resume counter value from a set of one or more resume counter values associated with one or more inactive threads. Each of the one or more resume counter values may be indicative of a program counter value at which a respective inactive thread should be activated. | 03-07-2013 |
20130091343 | OPERAND FETCHING CONTROL AS A FUNCTION OF BRANCH CONFIDENCE - Data operand fetching control includes calculating a summation weight value for each instruction in a pipeline, the summation weight value calculated as a function of branch uncertainty and a pendency in which the instruction resides in the pipeline relative to other instructions in the pipeline. The data operand fetching control also includes mapping the summation weight value of a selected instruction that is attempting to access system memory to a memory access control. Each memory access control specifies a manner of handling data fetching operations. The data operand fetching control further includes performing a memory access operation for the selected instruction based upon the mapping. | 04-11-2013 |
20130111195 | DATA PROCESSING SYSTEM WITH SAFE CALL AND RETURN | 05-02-2013 |
20130124838 | INSTRUCTION LEVEL EXECUTION PREEMPTION - One embodiment of the present invention sets forth a technique instruction level and compute thread array granularity execution preemption. Preempting at the instruction level does not require any draining of the processing pipeline. No new instructions are issued and the context state is unloaded from the processing pipeline. When preemption is performed at a compute thread array boundary, the amount of context state to be stored is reduced because execution units within the processing pipeline complete execution of in-flight instructions and become idle. If, the amount of time needed to complete execution of the in-flight instructions exceeds a threshold, then the preemption may dynamically change to be performed at the instruction level instead of at compute thread array granularity. | 05-16-2013 |
20130138931 | MAINTAINING THE INTEGRITY OF AN EXECUTION RETURN ADDRESS STACK - A processor and method for maintaining the integrity of an execution return address stack (RAS). The execution RAS is maintained in an accurate state by storing information regarding branch instructions in a branch information table. The first time a branch instruction is executed, an entry is allocated and populated in the table. If the branch instruction is re-executed, a pointer address is retrieved from the corresponding table entry and the execution RAS pointer is repositioned to the retrieved pointer address. The execution RAS can also be used to restore a speculative RAS due to a mis-speculation. | 05-30-2013 |
20130151822 | Efficient Enqueuing of Values in SIMD Engines with Permute Unit - Mechanisms, in a data processing system having a processor, for generating enqueued data for performing computations of a conditional branch of code are provided. Mask generation logic of the processor operates to generate a mask representing a subset of iterations of a loop of the code that results in a condition of the conditional branch being satisfied. The mask is used to select data elements from an input data element vector register corresponding to the subset of iterations of the loop of the code that result in the condition of the conditional branch being satisfied. Furthermore, the selected data elements are used to perform computations of the conditional branch of code. Iterations of the loop of the code that do not result in the condition of the conditional branch being satisfied are not used as a basis for performing computations of the conditional branch of code. | 06-13-2013 |
20130179668 | LAST BRANCH RECORD INDICATORS FOR TRANSACTIONAL MEMORY - In one embodiment, a processor includes an execution unit and at least one last branch record (LBR) register to store address information of a branch taken during program execution. This register may further store a transaction indicator to indicate whether the branch was taken during a transactional memory (TM) transaction. This register may further store an abort indicator to indicate whether the branch was caused by a transaction abort. Other embodiments are described and claimed. | 07-11-2013 |
20130198498 | COMPILING METHOD, PROGRAM, AND INFORMATION PROCESSING APPARATUS - A method, program, and apparatus for optimizing compiled code using a dynamic compiler. The method includes the steps of: generating intermediate code from a trace, which is an instruction sequence described in machine language; computing an offset between an address value, which is a base point of an indirect branch instruction, and a start address of a memory page, which includes a virtual address referred to by the information processing apparatus immediately after processing a first instruction; determining whether an indirect branch instruction that is subsequent to the first instruction causes processing to jump to another memory page, by using a value obtained from adding the offset to a displacement made by the indirect branch instruction; and optimizing the intermediate code by using the result of the determining step. | 08-01-2013 |
20130212364 | PRE-SCHEDULED REPLAYS OF DIVERGENT OPERATIONS - One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced. One advantage of the disclosed technique is that divergent operations requiring one or more replay operations execute with reduced latency. | 08-15-2013 |
20130326204 | Configuration-Preserving Preprocessor and Configuration-Preserving Parser - Methods, systems, and apparatuses, including computer programs encoded on computer readable media, for generating a plurality of tokens from one or more source files that include source code in a first programming language. The source code includes one or more static conditionals that include a conditional expression and branch code that is operative when the conditional expression is true. Various configurations are possible based upon the conditionals. A first static conditional includes one or more nested static conditionals within the branch code associated with the first static conditional. Each of the one or more nested static conditionals is hoisted to a beginning of the branch code associated with the first static conditional. Each innermost branch code does not contain a static conditional and each possible configuration is preserved. | 12-05-2013 |
20130339690 | TRANSACTIONAL EXECUTION BRANCH INDICATIONS - Transactional execution branch indications are placed into one or more transaction diagnostic blocks when a transaction is aborted. Each branch indication specifies whether a branch was taken, as a result of executing a branch instruction within the transaction. As the transaction executes and a branch instruction is encountered, a branch indication is set in a vector indicating whether the branch was taken. Then, if the transaction aborts, the indicators are stored in one or more transaction diagnostic blocks providing a branch history usable in diagnosing the failure. | 12-19-2013 |
20140006759 | RECORDING MEDIUM STORING ADDRESS MANAGEMENT PROGRAM, ADDRESS MANAGEMENT METHOD, AND APPARATUS | 01-02-2014 |
20140047223 | SELECTIVELY ACTIVATING A RESUME CHECK OPERATION IN A MULTI-THREADED PROCESSING SYSTEM - This disclosure describes techniques for selectively activating a resume check operation in a single instruction, multiple data (SIMD) processing system. A processor is described that is configured to selectively enable or disable a resume check operation for a particular instruction based on information included in the instruction that indicates whether a resume check operation is to be performed for the instruction. A compiler is also described that is configured to generate compiled code which, when executed, causes a resume check operation to be selectively enabled or disabled for particular instructions. The compiled code may include one or more instructions that each specify whether a resume check operation is to be performed for the respective instruction. The techniques of this disclosure may be used to reduce the power consumption of and/or improve the performance of a SIMD system that utilizes a resume check operation to manage the reactivation of deactivated threads. | 02-13-2014 |
20140089646 | PROCESSOR WITH INTERRUPTABLE INSTRUCTION EXECUTION - A processor includes a plurality of execution units. Each of the execution units includes a status register configured to store a value indicative of state of the execution unit. At least one of the execution units is configured to execute a complex instruction that requires multiple instruction cycles to execute. The at least one of the execution units is also configured to interrupt execution of the complex instruction to execute a different instruction, and resume, based on the value stored in the status register, execution of the complex instruction at the point interrupted after execution of the different instruction. | 03-27-2014 |
20140129812 | SYSTEM AND METHOD FOR EXECUTING SEQUENTIAL CODE USING A GROUP OF HREADS AND SINGLE-INSTRUCTION, MULTIPLE-THREAD PROCESSOR INCORPORATING THE SAME - A system and method for executing sequential code in the context of a single-instruction, multiple-thread (SIMT) processor. In one embodiment, the system includes: (1) a pipeline control unit operable to create a group of counterpart threads of the sequential code, one of the counterpart threads being a master thread, remaining ones of the counterpart threads being slave threads and (2) lanes operable to: (2 | 05-08-2014 |
20140164747 | Branch-Free Condition Evaluation - A compare instruction of an instruction set architecture (ISA), when executed tests one or more operands for an instruction defined condition. The result of the test is stored as an operand, with leading zeros, in a general register of the ISA. The general register is identified (explicitly or implicitly) by the compare instruction. Thus, the result of the test can be manipulated by standard register operations of the computer system. In a superscalar processor, no special “condition code” renaming is required, as the standard register renaming takes care of out-of-order processing of the conditions. | 06-12-2014 |
20140173260 | COMPUTER PROCESSOR WITH INSTRUCTION FOR EXECUTION BASED ON AVAILABLE INSTRUCTION SETS - A system and method for testing whether a computer processor is capable of executing a requested instruction set. The system includes a computer processor configured to receive an encoded conditional branch instruction in a form of machine code executable directly by the computer processor, and implement the encoded conditional branch instruction unconditionally, based on underlying hardware architecture of the computer processor. The Method for testing whether a computer processor is capable of executing a requested instruction set, the method including, receiving an encoded conditional branch instruction in a form of machine code executable directly by the computer processor, and implementing the encoded conditional branch instruction unconditionally, based on underlying hardware architecture of the computer processor. | 06-19-2014 |
20140173261 | COMPUTER PROCESSOR WITH INSTRUCTION FOR EXECUTION BASED ON AVAILABLE INSTRUCTION SETS - A system and method for testing whether a computer processor is capable of executing a requested instruction set. The system includes a computer processor configured to receive an encoded conditional branch instruction in a form of machine code executable directly by the computer processor, and implement the encoded conditional branch instruction unconditionally, based on underlying hardware architecture of the computer processor. The Method for testing whether a computer processor is capable of executing a requested instruction set, the method including, receiving an encoded conditional branch instruction in a form of machine code executable directly by the computer processor, and implementing the encoded conditional branch instruction unconditionally, based on underlying hardware architecture of the computer processor. | 06-19-2014 |
20140195787 | TRACKING SPECULATIVE EXECUTION OF INSTRUCTIONS FOR A REGISTER RENAMING DATA STORE - First processing circuitry processes at least part of a stream of program instructions. The first processing circuitry has registers for storing data and register renaming circuitry for mapping architectural register specifiers to physical register specifiers. A renaming data store stores renaming entries for identifying a register mapping between the architectural and physical register specifiers. At least some renaming entries have a count value indicating a number of speculation points occurring between generation of a previous count value and generation of the count value. The speculation points may for example be branch operation or load/store operations. | 07-10-2014 |
20140208084 | COMPOUND COMPLEX INSTRUCTION SET COMPUTER (CCISC) PROCESSOR ARCHITECTURE - A processor system includes a multichannel memory operable to store data values and a program memory operable to store Compound CISC (CCISC) instructions. The processor system also includes a processor operable to execute a computer program assembled with at least a portion of the compound CCISC instructions, to retrieve a CCISC instruction from the program memory, to access at least two data values in the multichannel memory based on the executed computer program, and to operate on the at least two data values in the multichannel memory based on the C-CISC instruction. The processor retrieves the CCISC instruction, accesses the at least two data values, and operates on the at least two data values during a same clock cycle. | 07-24-2014 |
20140215193 | PROCESSOR CAPABLE OF SUPPORTING MULTIMODE AND MULTIMODE SUPPORTING METHOD THEREOF - Embodiments include a processor capable of supporting multi-mode and corresponding methods. The processor includes front end units, a number of processing elements more than a number of the front end units; and a controller configured to determine if thread divergence occurs due to conditional branching. If there is thread divergence, the processor may set control information to control processing elements using currently activated front end units. If there is not, the processor may set control information to control processing elements using a currently activated front end unit. | 07-31-2014 |
20140258694 | Apparatus and Method for Branch Instruction Bonding - A processor is configured to identify a branch instruction immediately followed by an architectural delay slot. A single bonded instruction comprising the branch instruction immediately followed by the architectural delay slot is created. The single bonded instruction is loaded into an instruction buffer. | 09-11-2014 |
20140258695 | Last Branch Record Indicators For Transactional Memory - In one embodiment, a processor includes an execution unit and at least one last branch record (LBR) register to store address information of a branch taken during program execution. This register may further store a transaction indicator to indicate whether the branch was taken during a transactional memory (TM) transaction. This register may further store an abort indicator to indicate whether the branch was caused by a transaction abort. Other embodiments are described and claimed. | 09-11-2014 |
20150026443 | Branching To Alternate Code Based on Runahead Determination - The description covers a system and method for operating a micro-processing system having a runahead mode of operation. In one implementation, the method includes providing, for a first portion of code, a runahead correlate. When the first portion of code is encountered by the micro-processing system, a determination is made as to whether the system is operating in the runahead mode. If so, the system branches to the runahead correlate, which is specifically configured to identify and resolve latency events likely to occur when the first portion of code is encountered outside of runahead. Branching out of the first portion of code may also be performed based on a determination that a register is poisoned. | 01-22-2015 |
20150026444 | Compiler-control Method for Load Speculation In a Statically Scheduled Microprocessor - A statically scheduled processor compiler schedules a speculative load in the program before the data is needed. The compiler inserts a conditional instruction confirming or disaffirming the speculative load before the program behavior changes due to the speculative load. The condition is not based solely upon whether the speculative load address is correct but preferably includes dependence according to the original source code. The compiler may statically schedule two or more branches in parallel with orthogonal conditions. | 01-22-2015 |
20150026445 | PROCESSOR TESTING - Processors may be tested according to various implementations. In one general implementation, a process for processor testing may include randomly generating a first plurality of branch instructions for a first portion of an instruction set, each branch instruction in the first portion branching to a respective instruction in a second portion of the instruction set. The process may also include randomly generating a second plurality of branch instructions for the second portion of the instruction set, each branch instruction in the second portion branching to a respective instruction in the first portion of the instruction set. The process may additionally include generating a plurality of instructions to increment a counter when each branch instruction is encountered during execution. | 01-22-2015 |
20150046689 | ARITHMETIC PROCESSING UNIT AND METHOD FOR CONTROLLING ARITHMETIC PROCESSING UNIT - An arithmetic processing unit including a branch instruction execution management unit configured to accumulate a branch instruction waiting to be executed and to manage completion of a branch instruction, a completion processing waiting storage unit configured to accumulate an identifier of an instruction waiting for completion processing according to an execution sequence of a program, a completion processing unit configured to activate resource update processing due to execution of a branch instruction when the completion processing unit receives an execution completion report for the branch instruction from the branch instruction execution management unit and identified by the identifier, and a promotion unit configured to, when an identifier accumulated at the top of the completion processing waiting storage unit indicates a branch instruction, cause the completion processing unit to activate the resource update processing without waiting for the execution completion report for the branch instruction. | 02-12-2015 |
20150058605 | BRANCH TRACE COMPRESSION - Exemplary methods, apparatuses, and systems assign a plurality of branch instructions within a computer program to a plurality of prime numbers. Each branch instruction is assigned a unique prime number within the plurality of prime numbers. A run-time branch trace value is determined to be divisible, without a remainder, by a first prime number of the plurality of prime numbers. The run-time branch trace value was generated during execution of the computer program. An output is generated indicating that a first branch instruction assigned to the first prime number was executed. | 02-26-2015 |
20150095628 | TECHNIQUES FOR DETECTING RETURN-ORIENTED PROGRAMMING - Various embodiments are generally directed to techniques to detect a return-oriented programming (ROP) attack by verifying target addresses of branch instructions during execution. An apparatus includes a processor component, and a comparison component for execution by the processor component to determine whether there is a matching valid target address for a target address of a branch instruction associated with a translated portion of a routine in a table comprising valid target addresses. Other embodiments are described and claimed. | 04-02-2015 |
20150106602 | RANDOMLY BRANCHING USING HARDWARE WATCHPOINTS - A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. The processor allocates a memory region for the purpose of creating “random branches” in the computer code utilizing existing memory access instructions. When the processor processes a given instruction, the processor both accesses a first location in the memory region and may determine a condition is satisfied. In response, the processor generates an interrupt. The corresponding interrupt handler may transfer control flow from the computer program to instrumentation code. The condition may include a pointer storing an address pointing to locations within the memory region equals a given address after the point is updated. Alternatively, the condition may include an updated data value stored in a location pointed to by the given address equals a threshold value. | 04-16-2015 |
20150127929 | Data Processing Device and Method of Controlling the Same - A data processing device includes an instruction executing part executing a normal task and a management task scheduling an execution order of the normal task with switching the normal task and the management task, a counter measuring an execution state of the normal task being executed in the instruction executing part, and a state controller controlling the counter based on the normal task being executed in the instruction executing part. The instruction executing part determines whether the normal task to be executed next of a plurality of normal tasks scheduled by the management task is a measurement object or not, and outputs an operation signal notifying the state controller of the determination result. The state controller operates the counter in accordance with the branch operation. | 05-07-2015 |
20150134939 | INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD AND MEMORY SYSTEM - An information processing system is provided. The information processing system includes a processor used to obtain information, a memory used to store the information and output an information block based on a received address; and a scanner used to generate an address based on the current information block and to provide the address to the memory, where the current information block is the information block currently outputted from the memory. Thus, the speed for obtaining the information block by the processor (information block requested device) is further improved, and the execution speed of the processor and the information processing system is improved. | 05-14-2015 |
20150347139 | DEVICE AND METHOD FOR APPROXIMATE MEMOIZATION - An exemplary embodiment relates generally to methods and apparatus of operating a computing device to perform approximate memoizations. Computer code analysis methods, special hardware units, and run-time apparatus that allow limited errors to occur are disclosed. A computer code generation process, part of compiler or interpreter of a computing system, targeting to insert special instructions in the software code of a computer program is also disclosed, wherein the special instructions may embed information to manage the approximation of value memoizations. The presented technology may reduce the electric power consumption of a computing system by reusing the results or part of the results of previous arithmetic or memory operations. Run-time hardware apparatus to manage the elimination of the operations and control the error introduced by approximate value memoizations are also disclosed. | 12-03-2015 |
20160026469 | DATA CACHE SYSTEM AND METHOD - A data cache system is provided. The system includes a central processing unit (CPU), a memory system, an instruction track table, a tracker and a data engine. The CPU is configured to execute instructions and read data. The memory system is configured to store the instructions and the data. The instruction track table is configured to store corresponding information of branch instructions stored in the memory system. The tracker is configured to point to a first data read instruction after an instruction currently being executed by the CPU. The data engine is configured to calculate a data address in advance before the CPU executes the data read instruction pointed to by the tracker. Further, the data engine is also configured to control the memory system to provide the corresponding data for the CPU based on the data address. | 01-28-2016 |
20160092240 | METHOD AND APPARATUS FOR SIMD STRUCTURED BRANCHING - An apparatus and method for a SIMD structured branching. For example, one embodiment of a processor comprises: an execution unit having a plurality of channels to execute instructions; and a branch unit to process control flow instructions and to maintain a per channel count for each channel and a control instruction count for the control flow instructions, the branch unit to enable and disable the channels based at least on the per channel count. | 03-31-2016 |
20160196140 | DATA PROCESSING DEVICE, METHOD OF REPORTING PREDICATE VALUES, AND DATA CARRIER | 07-07-2016 |
20160196142 | GENERATING AND EXECUTING A CONTROL FLOW | 07-07-2016 |
20160202981 | INDIRECT INSTRUCTION PREDICATION | 07-14-2016 |
20160378499 | VERIFYING BRANCH TARGETS - Apparatus and methods are disclosed for implementing bad jump detection in block-based processor architectures. In one example of the disclosed technology, a block-based processor includes one or more block-based processing cores configured to fetch and execute atomic blocks of instructions and a control unit configured to, based at least in part on receiving a branch signal indicating a target location is received from one of the instruction blocks, verify that the target location is a valid branch target. | 12-29-2016 |
20180024833 | PROCESSOR TESTING | 01-25-2018 |