Patents - stay tuned to the technology

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Branching (e.g., delayed branch, loop control, branch predict, interrupt)

Subclass of:

712 - Electrical computers and digital processing systems: processing architectures and instruction processing (e.g., processors)

712220000 - PROCESSING CONTROL

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
712234000 Conditional branching 359
712244000 Exeception processing (e.g., interrupts and traps) 110
712241000 Loop execution 60
712242000 To macro-instruction routine 1
20110167248EFFICIENT RESUMPTION OF CO-ROUTINES ON A LINEAR STACK - Unsuspended co-routines are handled by the machine call stack mechanism in which the stack grows and shrinks as recursive calls are made and returned from. When a co-routine is suspended, however, additional call stack processing is performed. A suspension message is issued, and the entire resume-able part of the call stack is removed, and is copied to the heap. A frame that returns control to a driver method (a resumer) is copied to the call stack so that resumption of the co-routine does not recursively reactivate the whole call stack. Instead the resumer reactivates only the topmost or most current frame called the leaf frame. When a co-routine is suspended, it does not return to its caller, but instead returns to the resumer that has reactivated it.07-07-2011
712243000 To microinstruction subroutine 1
20140122847MICROPROCESSOR THAT TRANSLATES CONDITIONAL LOAD/STORE INSTRUCTIONS INTO VARIABLE NUMBER OF MICROINSTRUCTIONS - An instruction translator receives a conditional load/store instruction that specifies a condition, destination/data register, base register, offset source, and memory addressing mode. The instruction instructs the microprocessor to load data from a memory location into the destination register (conditional load) or store data to the memory location from the data register (conditional store) only if the condition flags satisfy the condition. The offset source specifies whether the offset is an immediate value or a value in an offset register. The addressing mode specifies whether the base register is updated when the condition flags satisfy the condition. The instruction translator translates the conditional load instruction into a number of microinstructions, which varies as a function of the offset source, addressing mode, and whether the conditional instruction is a conditional load or store instruction. An out-of-order execution pipeline executes the microinstructions to generate results specified by the instruction.05-01-2014
Entries
DocumentTitleDate
20080301419Process model control flow with multiple synchronizations - Activations of a plurality of incoming branches may be detected at a synchronization point having a plurality of outgoing branches. A first synchronization may be executed after a first number of activations is detected, and at least one of a plurality of outgoing branches from the synchronization point may be activated, based on the first synchronization. A second synchronization may be executed after a second number of activations is detected, and at least a second one of the plurality of outgoing branches from the synchronization point may be activated, based on the second synchronization.12-04-2008
20090077360Software constructed stands for execution on a multi-core architecture - In one embodiment, the present invention includes a software-controlled method of forming instruction strands. The software may include instructions to obtain code of a superblock including a plurality of basic blocks, build a dependency directed acyclic graph (DAG) for the code, sort nodes coupled by edges of the dependency DAG into a topological order, form strands from the nodes based on hardware constraints, rule constraints, and scheduling constraints, and generate executable code for the strands and store the executable code in a storage. Other embodiments are described and claimed.03-19-2009
20090094445PROCESS RETEXT FOR DYNAMICALLY LOADED MODULES - A computer implemented method, apparatus, and computer program product for dynamically loading a module into an application address space. In response to receiving a checkpoint signal by a plurality of threads associated with an application running in a software partition, the plurality of threads rendezvous to a point outside an application text associated with the application. Rendezvousing the plurality of threads suspends execution of application text by the plurality of threads. The application text is moved out of an application address space for the application to form an available application address space. The available application address space is an address space that was occupied by the application text. A software module is moved into the available application address space.04-09-2009
20090119492Data Processing Apparatus and Method for Handling Procedure Call Instructions - A data processing apparatus and method are provided for handling procedure call instructions. The data processing apparatus has processing logic for performing data processing operations specified by program instructions fetched from a sequence of addresses, at least one of the program instructions being a procedure call instruction specifying a branch operation to be performed. Further, a control value is stored within control storage, and the processing logic is operable in response to a control value modifying instruction to modify that control value. If the control value is clear, the processing logic is operable in response to the procedure call instruction to generate a return address value in addition to performing the branch operation, whereas if the control value is set, the processing logic is operable in response to the procedure call instruction to suppress generation of the return address value and to cause the control value to be clear in addition to performing the branch operation. This provides significant flexibility in how procedure call instructions are used within the data processing apparatus.05-07-2009
20090138688Multi-Die Processor - Disclosed are a multi-die processor apparatus and system. Processor logic to execute one or more instructions is allocated among two or more face-to-faces stacked dice. The processor includes a conductive interface between the stacked dice to facilitate die-to-die communication.05-28-2009
20090177873INSTRUCTION GENERATION APPARATUS - A tampering-prevention-process generation apparatus (07-09-2009
20090198980FACILITATING PROCESSING IN A COMPUTING ENVIRONMENT USING AN EXTENDED DRAIN INSTRUCTION - An extended DRAIN instruction is used to stall processing within a computing environment. The instruction includes an indication of the one or more processing stages at which processing is to be stalled. It also includes a control that allows processing to be stalled for additional cycles, as desired.08-06-2009
20090254740INFORMATION PROCESSING DEVICE, ENCRYPTION METHOD OF INSTRUCTION CODE, AND DECRYPTION METHOD OF ENCRYPTED INSTRUCTION CODE - It is possible to achieve the protection of software with reduced overhead. For example, a memory for storing an encrypted code prepared in advance and a decryptor module for decrypting the code are provided. The decryptor module includes, for example, a three-stage pipeline and a selector for selecting one output from the outputs of each stage of the pipeline. When a branch instruction is issued and subsequent inputs of the pipeline are in the order of CD′10-08-2009
20090319762DYNAMIC RECONFIGURABLE CIRCUIT AND DATA TRANSMISSION CONTROL METHOD - A dynamic reconfigurable circuit includes multiple clusters each including a group of reconfigurable processing elements. The dynamic reconfigurable circuit is capable of dynamically changing a configuration of the clusters according to a context including a description of processing of the processing elements and of connection between the processing elements. A first cluster among the clusters includes a signal generating circuit that when an instruction to change the context is received, generates a report signal indicative of the instruction to change the context; a signal adding circuit that adds the report signal generated by the signal generating circuit to output data that is to be transmitted from the first cluster to a second cluster; and a data clearing circuit that, when output data to which a report signal generated by the second cluster is added is received, performs a clearing process of clearing the output data received.12-24-2009
20100011195PROCESSOR - A processor includes a plurality of executing sections configured to simultaneously execute instructions for a plurality of threads, an instruction issuing section configured to issue instructions to the plurality of executing sections, and an instruction sync monitoring section configured to, when an instruction-synchronizing instruction is issued to one or more of the plurality of executing sections from the instruction issuing section, monitor completion of execution of the instruction-synchronizing instruction for each of the executing sections, to which the instruction-synchronizing instruction has been issued, thus detecting completion of execution of preceding instructions for the thread to which the instruction-synchronizing instruction belongs. After issuing the instruction-synchronizing instruction, the instruction issuing section stops issuance of succeeding instructions for the thread to which the instruction-synchronizing instruction belongs, until the completion of execution of the preceding instructions for the thread to which the instruction-synchronizing instruction belongs is detected by the instruction sync monitoring section.01-14-2010
20100042819COMPUTER READABLE MEDIUM, OPERATION CONTROLLING METHOD, AND OPERATION CONTROL SYSTEM - A computer readable medium storing a program causing a computer to execute a process for controlling a plurality of operations, the process including: accepting a change request to change an operation result of an operation executed prior to a current execution-permitted operation which an execution is permitted based on an operation procedure for the operations; assuming an operation for which the change request is accepted in the accepting step as a starting point; and identifying an operation permitted to be executed with reference to the starting point based on the operation procedure.02-18-2010
20100095102INDIRECT BRANCH PROCESSING PROGRAM AND INDIRECT BRANCH PROCESSING METHOD - A processor reads an interpreter program to start an interpreter. The interpreter makes branch prediction by executing, in place of an indirect branch instruction that is necessary for execution of a source program, storing branch destination addresses in the indirect branch instruction in a link register (the processor internally stacks the branch destination addresses also in a return address stack) in inverse order and reading the addresses from the return address stack in one at a time manner.04-15-2010
20100106950METHOD AND SYSTEM FOR LOADING STATUS CONTROL OF DLL - Apparatus and methods are provided for controlling the loading status of DLLs. Specifically, a streaming program compiler is provided. The compiler includes operation modules for calling DLLs during streaming program execution; association table generating units for generating association tables according to user-defined rules, where the association table includes entries indicating (i) stream branches of the streaming program and (ii) an operation module corresponding to the stream branches; and a trigger generating unit for generating a trigger based on user-defined rules, where the trigger generating unit (i) determines which conditions for loading and unloading DLLs fit the streaming program, (ii) matches these conditions to a particular stream branch to identify a matched stream branch, and (iii) sends out triggering signals indicating the matched stream branch. This invention also provides a corresponding method and controller.04-29-2010
20100250905System and Method of Routing Instructions - Disclosed are a method and system for reducing complexity of routing of instructions from an instruction issue queue to appropriate execution pipelines in a superscalar processor. In one or more embodiments, an instruction steering unit of the superscalar processor receives ordered instructions. The steering unit determines that a first instruction and a subsequent second instruction of the ordered instructions are non-branching instructions, and the steering unit stores the first and second instructions in two non-branching instruction issue queue entries of a shadow queue. The steering unit determines whether or not a third instruction the ordered instructions is a branch instruction, where the third instruction is subsequent to the second instruction. If the third instruction is a branch instruction, the steering unit stores the third instruction in a branch entry of the shadow queue; otherwise, the steering unit stores a no operation instruction in the branch entry of the shadow queue.09-30-2010
20100287361Root Cause Analysis for Complex Event Processing - Root cause analysis for complex event processing is described. In embodiments, root cause analysis at a complex event processor is automatically performed by selecting an output event from an operator and correlating the output event to an input event using event type and lifetime data for the input event and the output event stored in a data store. Embodiments describe how the lifetime data can comprise a start time and an end time for the event, and the correlation can be based on a comparison of the start and end times between the input and output events. Embodiments describe how the correlation algorithm used is selected in dependence on the event type. In embodiments, a complex event processing engine comprises a logging unit arranged to store in the data store an indicator of an event type and lifetime data for each output event from an operator.11-11-2010
20110066833CHURCH-TURING THESIS: THE TURING IMMORTALITY PROBLEM SOLVED WITH A DYNAMIC REGISTER MACHINE - A new computing machine and new methods of executing and solving heretofore unknown computational problems are presented here. The computing system demonstrated here can be implemented with a program composed of instructions such that instructions may be added or removed while the instructions are being executed. The computing machine is called a Dynamic Register Machine. The methods demonstrated apply to new hardware and software technology. The new machine and methods enable advances in machine learning, new and more powerful programming languages, and more powerful and flexible compilers and interpreters.03-17-2011
20110072248UNANIMOUS BRANCH INSTRUCTIONS IN A PARALLEL THREAD PROCESSOR - One embodiment of the present invention sets forth a mechanism for managing thread divergence in a thread group executing a multithreaded processor. A unanimous branch instruction, when executed, causes all the active threads in the thread group to branch only when each thread in the thread group agrees to take the branch. In such a manner, thread divergence is eliminated. A branch-any instruction, when executed, causes all the active threads in the thread group to branch when at least one thread in the thread group agrees to take the branch.03-24-2011
20110154001SYSTEM AND METHOD FOR PROCESSING INTERRUPTS IN A COMPUTING SYSTEM - A system, processor and method are provided for digital signal processing A processor may initiate processing a sequence of instructions followed by an interrupt Each instruction may be processed in respective sequential pipeline slots A branch detector may detect or determine if an instruction is a branch instruction, for example, in turn, for each sequential instruction In one embodiment, the branch detector may detect if an instruction is a branch instruction until at least a first branch instruction is detected. A processor may annul instructions which are determined to be branch instructions when the interrupt occupies a delay slot associated with the branch instruction. An execution unit may execute at least the sequence of instructions to run a program. The branch detector and/or execution unit may be integral or separate from each other and from the processor06-23-2011
20110167246Systems and Methods for Data Detection Including Dynamic Scaling - Various embodiments of the present invention provide systems and methods for data processing. For example, a data processing system is disclosed that includes a channel detector circuit. The channel detector circuit includes a branch metric calculator circuit that is operable to receive a number of violated checks from a preceding stage, and to scale an intrinsic branch metric using a scalar selected based at least in part on the number of violated checks to yield a scaled intrinsic branch metric.07-07-2011
20110219220Link Stack Repair of Erroneous Speculative Update - Whenever a link address is written to the link stack, the prior value of the link stack entry is saved, and is restored to the link stack after a link stack push operation is speculatively executed following a mispredicted branch. This condition is detected by maintaining a count of the total number of uncommitted link stack write instructions in the pipeline, and a count of the number of uncommitted link stack write instructions ahead of each branch instruction. When a branch is evaluated and determined to have been mispredicted, the count associated with it is compared to the total count. A discrepancy indicates a link stack write instruction was speculatively issued into the pipeline after the mispredicted branch instruction, and pushed a link address onto the link stack. The prior link address is restored to the link stack from the link stack restore buffer.09-08-2011
20110320785Binary Rewriting in Software Instruction Cache - Mechanisms are provided for dynamically rewriting branch instructions in a portion of code. The mechanisms execute a branch instruction in the portion of code. The mechanisms determine if a target instruction of the branch instruction, to which the branch instruction branches, is present in an instruction cache associated with the processor. Moreover, the mechanisms directly branch execution of the portion of code to the target instruction in the instruction cache, without intervention from an instruction cache runtime system, in response to a determination that the target instruction is present in the instruction cache. In addition, the mechanisms redirect execution of the portion of code to the instruction cache runtime system in response to a determination that the target instruction cannot be determined to be present in the instruction cache.12-29-2011
20110320786Dynamically Rewriting Branch Instructions in Response to Cache Line Eviction - Mechanisms are provided for evicting cache lines from an instruction cache of the data processing system. The mechanisms store, for a portion of code in a current cache line, a linked list of call sites that directly or indirectly target the portion of code in the current cache line. A determination is made as to whether the current cache line is to be evicted from the instruction cache. The linked list of call sites is processed to identify one or more rewritten branch instructions having associated branch stubs, that either directly or indirectly target the portion of code in the current cache line. In addition, the one or more rewritten branch instructions are rewritten to restore the one or more rewritten branch instructions to an original state based on information in the associated branch stubs.12-29-2011
20120030453INFORMATION PROCESSING APPARATUS, CACHE APPARATUS, AND DATA PROCESSING METHOD - A more efficient technique is provided in an information processing apparatus which executes processing using pipelines. An information processing apparatus according to this invention includes a first pipeline, second pipeline, processing unit, and reorder unit. The first pipeline has a plurality of first nodes, and shifts first data held in a first node to a first node. The second pipeline has a plurality of second nodes respectively corresponding to the first nodes of the first pipeline, and shifts second data held in a second node to a second node. The processing unit executes data processing using the first data and the second data. The reorder unit holds one of the output second data based on attribute information of the second data output from the second pipeline, and outputs the held second data to the second pipeline.02-02-2012
20120036342METHOD AND APPARATUS FOR HANDLING LANE CROSSING INSTRUCTIONS IN AN EXECUTION PIPELINE - The present invention provides a method and apparatus for handling lane-crossing instructions in an execution pipeline. One embodiment of the method includes conveying bits of an instruction from a register to an execution stage in a pipeline along a first data path that includes a lane crossing stage configured to change a first mapping of the register to the execution stage to a second mapping. The method also includes concurrently conveying the bits along a second data path from the register to the execution stage that bypasses the lane crossing stage. The method further includes selecting the first or second data path to provide the bits to the execution stage.02-09-2012
20120066483Computing Device with Asynchronous Auxiliary Execution Unit - A computing device includes: an instruction cache storing primary execution unit instructions and auxiliary execution unit instructions in a sequential order; a primary execution unit configured to receive and execute the primary execution unit instructions from the instruction cache; an auxiliary execution unit configured to receive and execute only the auxiliary execution unit instructions from the instruction cache in a manner independent from and asynchronous to the primary execution unit; and completion circuitry configured to coordinate completion of the primary execution unit instructions by the primary execution unit and the auxiliary execution unit instructions according to the sequential order.03-15-2012
20120072705Obtaining And Releasing Hardware Threads Without Hypervisor Involvement - A first hardware thread executes a software program instruction, which instructs the first hardware thread to initiate a second hardware thread. As such, the first hardware thread identifies one or more register values accessible by the first hardware thread. Next, the first hardware thread copies the identified register values to one or more registers accessible by the second hardware thread. In turn, the second hardware thread accesses the copied register values included in the accessible registers and executes software code accordingly.03-22-2012
20120072706Microcode for Transport Triggered Architecture Central Processing Units - The different advantageous embodiments provide an apparatus comprising a central processing unit, a microcode store, and a number of functional units. The central processing unit utilizes transport triggered architecture and is configured to execute microcoded instructions that allow a single instruction to be executed as multiple instructions. The microcode store includes a number of microcoded instruction implementations. The number of functional units includes a number of useful entry points into the microcode store.03-22-2012
20120089822INFORMATION PROCESSING DEVICE AND EMULATION PROCESSING PROGRAM AND METHOD - An emulation processing method causing a computer including a first and a second processor to execute emulation processing, the emulation processing method includes: calculate a next instruction address next to a received instruction address, and transmit, to the second processor, the calculated instruction address and instruction information read out on the basis of the calculated instruction address, transmit, to the first processor, a first instruction address that is an instruction address included in an execution result of executed processing, and execute processing based on the instruction information received from the first processor, when a second instruction address that is the instruction address received from the first processor is identical to the first instruction address, and read out instruction information on the basis of the first instruction address and execute processing based on the instruction information read out, when the second instruction address is not identical to the first instruction address.04-12-2012
20120096246NONVOLATILE STORAGE USING LOW LATENCY AND HIGH LATENCY MEMORY - Nonvolatile storage includes first and second memory types with different read latencies. FLASH memory and phase change memory are examples. A first portion of a data block is stored in the phase change memory and a second portion of the data block is stored in the FLASH memory. The first portion of the data block is accessed prior to the second portion of the data block during a read operation.04-19-2012
20120210106METHOD OF PROCESSING INSTRUCTIONS IN PIPELINE-BASED PROCESSOR - The present invention discloses a method of processing instructions in a pipeline-based central processing unit, wherein the pipeline is partitioned into base pipeline stages and enhanced pipeline stages according to functions, the base pipeline stages being activated all the while, and the enhanced pipeline stages being activated or shutdown according to requirements for performance of a workload. The present invention further discloses a method of processing instructions in a pipeline-based central processing unit, wherein the pipeline is partitioned into base pipeline stages and enhanced pipeline stages according to functions, each pipeline stage being partitioned into a base module and at least one enhanced module, the base module being activated all the while, and the enhanced module being activated or shutdown according to requirements for performance of a workload.08-16-2012
20120254597BRANCH-AND-BOUND ON DISTRIBUTED DATA-PARALLEL EXECUTION ENGINES - A distributed data-parallel execution (DDPE) system splits a computational problem into a plurality of sub-problems using a branch-and-bound algorithm, designates a synchronous stop time for a “plurality of processors” (for example, a cluster) for each round of execution, processes the search tree by recursively using a branch-and-bound algorithm in multiple rounds (without inter-processor communications), determines if further processing is required based on the processing round state data, and terminates processing on the processors when processing is completed.10-04-2012
20120290820SUPPRESSION OF CONTROL TRANSFER INSTRUCTIONS ON INCORRECT SPECULATIVE EXECUTION PATHS - Techniques are disclosed relating to a processor that is configured to execute control transfer instructions (CTIs). In some embodiments, the processor includes a mechanism that suppresses results of mispredicted younger CTIs on a speculative execution path. This mechanism permits the branch predictor to maintain its fidelity, and eliminates spurious flushes of the pipeline. In one embodiment, a misprediction bit is be used to indicate that a misprediction has occurred, and younger CTIs than the CTI that was mispredicted are suppressed. In some embodiments, the processor may be configured to execute instruction streams from multiple threads. Each thread may include a misprediction indication. CTIs in each thread may execute in program order with respect to other CTIs of the thread, while instructions other than CTIs may execute out of program order.11-15-2012
20130019085Efficient Recombining for Dual Path ExecutionAANM Cain, III; Harold W.AACI HartsdaleAAST NYAACO USAAGP Cain, III; Harold W. Hartsdale NY USAANM Daly; David M.AACI Croton on HudsonAAST NYAACO USAAGP Daly; David M. Croton on Hudson NY USAANM Huang; Michael C.AACI RochesterAAST NYAACO USAAGP Huang; Michael C. Rochester NY USAANM Moreira; Jose E.AACI IrvingtonAAST NYAACO USAAGP Moreira; Jose E. Irvington NY USAANM Park; ILAACI SeoulAACO KRAAGP Park; IL Seoul KR - A mechanism is provided for reducing a penalty for executing a correct branch of a branch instruction. An execution unit in a processor of a data processing system executes a first branch of the branch instruction from a main thread of a processor and executes a second branch of the branch instruction from an assist thread of the processor. The execution unit determines whether the main thread is a correct branch of the branch instruction or the assist thread is the correct branch of the branch instruction. Responsive to the assist thread being the correct branch of the branch instruction, the execution unit pauses execution of the branch instruction on both the main thread and the assist thread. The execution unit then properly inherits a context of the main thread in order that execution of the second branch may continue.01-17-2013
20130024675RETURN ADDRESS OPTIMISATION FOR A DYNAMIC CODE TRANSLATOR - A dynamic code translator with isoblocking uses a return trampoline having branch instructions conditioned on different isostates to optimize return address translation, by allowing the hardware to predict that the address of a future return will be the address of trampoline. An IP relative call is inserted into translated code to write the trampoline address to a target link register and a target return address stack used by the native machine to predict return addresses. If a computed subject return address matches a subject return address register value, the current isostate of the isoblock is written to an isostate register. The isostate value in the isostate register is then used to select the branch instruction in the trampoline for the true subject return address. Sufficient code area in the trampoline instruction set can be reserved for a number of compare/branch pairs which is equal to the number of available isostates.01-24-2013
20130159684BATCHED REPLAYS OF DIVERGENT OPERATIONS - One embodiment of the present invention sets forth an optimized way to execute replay operations for divergent operations in a parallel processing subsystem. Specifically, the streaming multiprocessor (SM) includes a multistage pipeline configured to batch two or more replay operations for processing via replay loop. A logic element within the multistage pipeline detects whether the current pipeline stage is accessing a shared resource, such as loading data from a shared memory. If the threads are accessing data which are distributed across multiple cache lines, then the multistage pipeline batches two or more replay operations, where the replay operations are inserted into the pipeline back-to-back. Advantageously, divergent operations requiring two or more replay operations operate with reduced latency. Where memory access operations require transfer of more than two cache lines to service all threads, the number of clock cycles required to complete all replay operations is reduced.06-20-2013
20130198496MAJOR BRANCH INSTRUCTIONS - Major branch instructions are provided that enable execution of a computer program to branch from one segment of code to another segment of code. These instructions also create a new stream of processing at the other segment of code enabling execution of the other segment of code to be performed in parallel with the segment of code from which the branch was taken. In one example, the other stream of processing starts a transaction for processing instructions of the other stream of processing.08-01-2013
20130198497MAJOR BRANCH INSTRUCTIONS WITH TRANSACTIONAL MEMORY - Major branch instructions are provided that enable execution of a computer program to branch from one segment of code to another segment of code. These instructions also create a new stream of processing at the other segment of code enabling execution of the other segment of code to be performed in parallel with the segment of code from which the branch was taken. In one example, the other stream of processing starts a transaction for processing instructions of the other stream of processing.08-01-2013
20130232323OBFUSCATION OF CONTROL FLOW OF SOFTWARE - Methods, media and systems that obfuscate control flow in software programs. The obfuscation can impede or prevent static flow analysis of a software program's control flow. In one embodiment, a method, performed by a data processing system, identifies each branch point in a set of branch points in a first version of software and replaces, in each branch point in the set, a representation of a target of the branch point with a computed value that depends upon at least one prior computed value in a stream of instructions in the first version of software. Other embodiments are also described.09-05-2013
20130311758HARDWARE PROFILING MECHANISM TO ENABLE PAGE LEVEL AUTOMATIC BINARY TRANSLATION - A hardware profiling mechanism implemented by performance monitoring hardware enables page level automatic binary translation. The hardware during runtime identifies a code page in memory containing potentially optimizable instructions. The hardware requests allocation of a new page in memory associated with the code page, where the new page contains a collection of counters and each of the counters corresponds to one of the instructions in the code page. When the hardware detects a branch instruction having a branch target within the code page, it increments one of the counters that has the same position in the new page as the branch target in the code page. The execution of the code page is repeated and the counters are incremented when branch targets fall within the code page. The hardware then provides the counter values in the new page to a binary translator for binary translation.11-21-2013
20130339688MANAGEMENT OF MULTIPLE NESTED TRANSACTIONS - Embodiments relate to implementing processor management of transactions. An aspect includes receiving an instruction from a thread. The instruction includes an instruction type, and executes within a transaction. The transaction effectively delays committing stores to memory until the transaction has completed. A processor manages transaction nesting for the instruction based on the instruction type of the instruction. The transaction nesting includes a maximum processor capacity. The transaction nesting management performs enables executing a sequence of nested transactions within a transaction, supports multiple nested transactions in a processor pipeline, or generates and maintains a set of effective controls for controlling a pipeline. The processor prevents the transaction nesting from exceeding the maximum processor capacity.12-19-2013
20130339689LATER STAGE READ PORT REDUCTION - In some implementations, a register file has a plurality of read ports for providing data to a micro-operation during execution of the micro-operation. For example, the micro-operation may utilize at least two data sources, with at least one first data source being utilized at least one pipeline stage earlier than at least one second data source. A number of register file read ports may be allocated for executing the micro-operation. A bypass calculation is performed during a first pipeline stage to detect whether the at least one second data source is available from a bypass network. During a subsequent second pipeline stage, when the at least one second data source is detected to be available from the bypass network, the number of the read ports allocated to the micro-operation may be reduced.12-19-2013
20140075165EXECUTING SUBROUTINES IN A MULTI-THREADED PROCESSING SYSTEM - This disclosure is directed to techniques for executing subroutines in a single instruction, multiple data (SIMD) processing system that is subject to divergent thread conditions. In particular, a resume counter-based approach for managing divergent thread state is described that utilizes program module-specific minimum resume counters (MINRCs) for the efficient processing of control flow instructions. In some examples, the techniques of this disclosure may include using a main program MINRC to control the execution of a main program module and subroutine-specific MINRCs to control the execution of subroutine program modules. Techniques are also described for managing the main program MINRC and subroutine-specific MINRCs when subroutine call and return instructions are executed. Techniques are also described for updating a subroutine-specific MINRC to ensure that the updated MINRC value for the subroutine-specific MINRC is within the program space allocated for the subroutine.03-13-2014
20140181485STICKY BIT UPDATE WITHIN A SPECULATIVE EXECUTION PROCESSING ENVIRONMENT - A data processing apparatus 06-26-2014
20140244986SYSTEM AND METHOD TO SELECT A PACKET FORMAT BASED ON A NUMBER OF EXECUTED THREADS - A system and method to select a packet format based on a number of executed threads is disclosed. In a particular embodiment, a method includes determining, at a multi-threaded processor, a number of threads of a plurality of threads executing during a time period. A packet format is determined from a plurality of formats based at least in part on the determined number of threads. Data associated with execution of an instruction by a particular thread is stored in accordance with the selected format in a memory (e.g., a buffer).08-28-2014
20140258693SYSTEM AND METHOD FOR HARDWARE SCHEDULING OF CONDITIONAL BARRIERS AND IMPATIENT BARRIERS - A method and a system are provided for hardware scheduling of barrier instructions. Execution of a plurality of threads to process instructions of a program that includes a barrier instruction is initiated, and when each thread reaches the barrier instruction during execution of program, it is determined whether the thread participates in the barrier instruction. The threads that participate in the barrier instruction are then serially executed to process one or more instructions of the program that follow the barrier instruction. A method and system are also provided for impatient scheduling of barrier instructions. When a portion of the threads that is greater than a minimum number of threads and less than all of the threads in the plurality of threads reaches the barrier instruction each of the threads in the portion is serially executed to process one or more instructions of the program that follow the barrier instruction.09-11-2014
20140365752Method and System for Yield Operation Supporting Thread-Like Behavior - A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.12-11-2014
20150026442SYSTEM, METHOD, AND COMPUTER PROGRAM PRODUCT FOR MANAGING OUT-OF-ORDER EXECUTION OF PROGRAM INSTRUCTIONS - A method, system and computer program product embodied on a computer-readable medium are provided for managing the execution of out-of-order instructions. The method includes the steps of receiving a plurality of instructions and identifying a subset of instructions in the plurality of instructions to be executed out-of-order.01-22-2015
20150100768SCHEDULING PROGRAM INSTRUCTIONS WITH A RUNNER-UP EXECUTION POSITION - A single instruction multiple thread (SIMT) processor 04-09-2015
20150324198CONTROL FLOW WITHOUT BRANCHING - A method for computing in a thread-based environment provides manipulating an execution mask to enable and disable threads when executing multiple conditional function clauses for process instructions. Execution lanes are controlled based on execution participation for the process instructions for reducing resource consumption. Execution of particular one or more schedulable structures that include multiple process instructions are skipped based on the execution mask and activating instructions.11-12-2015
20150347140PROCESSOR THAT LEAPFROGS MOV INSTRUCTIONS - A processor performs out-of-order execution of a first instruction and a second instruction after the first instruction in program order, the first instruction includes source and destination indicators, the source indicator specifies a source of data, the destination indicator specifies a destination of the data, the first instruction instructs the processor to move the data from the source to the destination, the second instruction specifies a source indicator that specifies a source of data. A rename unit updates the second instruction source indicator with the first instruction source indicator if there are no intervening instructions that write to the source or to the destination of the first instruction and the second instruction source indicator matches the first instruction destination indicator.12-03-2015
20160092239METHOD AND APPARATUS FOR UNSTRUCTURED CONTROL FLOW FOR SIMD EXECUTION ENGINE - An apparatus and method for a SIMD unstructured branching. For example, one embodiment of a processor comprises: an execution unit having a plurality of channels to execute instructions; and a branch unit to process unstructured control flow instructions and to maintain a per channel count value for each channel, the branch unit to store instruction pointer tags for the unstructured control flow instructions in a memory and identify the instruction pointer tags using tag addresses, the branch unit to further enable and disable the channels based at least on the per channel count value.03-31-2016
20160202985Variable Length Instruction Processor System and Method07-14-2016
20160378498Systems, Methods, and Apparatuses for Last Branch Record Support - Systems, methods, and apparatuses for last branch record support are described. In an embodiment, a hardware processor core comprises a hardware execution unit to execute a branch instruction, at least two last branch record (LBR) registers to store a source and destination information of a branch taken during program execution, wherein an entry in a LBR register to include an encoding of the branch, a write bit array to indicate which LBR register is architecturally correct, an architectural bit array to indicate when an LBR register has been written, and a plurality of top of stack pointers to indicate which LBR register in a LBR register stack is to be written.12-29-2016
20190146799METHOD AND SYSTEM FOR YIELD OPERATION SUPPORTING THREAD-LIKE BEHAVIOR05-16-2019
20220137968Hardware Verification of Dynamically Generated Code - In an embodiment, dynamically-generated code may be supported in the system by ensuring that the code either remains executing within a predefined region of memory or exits to one of a set of valid exit addresses. Software embodiments are described in which the dynamically-generated code is scanned prior to permitting execution of the dynamically-generated code to ensure that various criteria are met including exclusion of certain disallowed instructions and control of branch target addresses. Hardware embodiments are described in which the dynamically-generated code is permitted to executed but is monitored to ensure that the execution criteria are met.05-05-2022

Patent applications in class Branching (e.g., delayed branch, loop control, branch predict, interrupt)

Patent applications in all subclasses Branching (e.g., delayed branch, loop control, branch predict, interrupt)

Website © 2025 Advameg, Inc.