Patent application number | Description | Published |
20110145551 | TWO-STAGE COMMIT (TSC) REGION FOR DYNAMIC BINARY OPTIMIZATION IN X86 - Generally, the present disclosure provides systems and methods to generate a two-stage commit (TSC) region which has two separate commit stages. Frequently executed code may be identified and combined for the TSC region. Binary optimization operations may be performed on the TSC region to enable the code to run more efficiently by, for example, reording load and store instructions. In the first stage, load operations in the region may be committed atomically and in the second stage, store operations in the region may be committed atomically. | 06-16-2011 |
20120016853 | EFFICIENT AND CONSISTENT SOFTWARE TRANSACTIONAL MEMORY - A method and apparatus for efficient and consistent validation/conflict detection in a Software Transactional Memory (STM) system is herein described. A version check barrier is inserted after a load to compare versions of loaded values before and after the load. In addition, a global timestamp (GTS) is utilized to track a latest committed transaction. Each transaction is associated with a local timestamp (LTS) initialized to the GTS value at the start of a transaction. As a transaction commits it updates the GTS to a new value and sets versions of modified locations to the new value. Pending transactions compare versions determined in read barriers to their LTS. If the version is greater than their LTS indicating another transaction has committed after the pending transaction started and initialized the LTS, then the pending transaction validates its read set to maintain efficient and consistent transactional execution. | 01-19-2012 |
20120079245 | DYNAMIC OPTIMIZATION FOR CONDITIONAL COMMIT - An apparatus and method is described herein for conditionally committing and/or speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions. | 03-29-2012 |
20120079246 | APPARATUS, METHOD, AND SYSTEM FOR PROVIDING A DECISION MECHANISM FOR CONDITIONAL COMMITS IN AN ATOMIC REGION - An apparatus and method is described herein for conditionally committing /andor speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions. | 03-29-2012 |
20120185714 | METHOD, APPARATUS, AND SYSTEM FOR ENERGY EFFICIENCY AND ENERGY CONSERVATION INCLUDING CODE RECIRCULATION TECHNIQUES - An apparatus, method and system is described herein for enabling intelligent recirculation of hot code sections. A hot code section is determined and marked with a begin and end instruction. When the begin instruction is decoded, recirculation logic in a back-end of a processor enters a detection mode and loads decoded loop instructions. When the end instruction is decoded, the recirculation logic enters a recirculation mode. And during the recirculation mode, the loop instructions are dispatched directly from the recirculation logic to execution stages for execution. Since the loop is being directly serviced out of the back-end, the front-end may be powered down into a standby state to save power and increase energy efficiency. Upon finishing the loop, the front-end is powered back on and continues normal operation, which potentially includes propagating next instructions after the loop that were prefetched before the front-end entered the standby mode. | 07-19-2012 |
20120233477 | DYNAMIC CORE SELECTION FOR HETEROGENEOUS MULTI-CORE SYSTEMS - Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core. | 09-13-2012 |
20120260072 | REGISTER ALLOCATION IN ROTATION BASED ALIAS PROTECTION REGISTER - A system may comprises an optimizer/scheduler to schedule on a set of instructions, compute a data dependence, a checking constraint and/or an anti-checking constraint for the set of scheduled instructions, and allocate alias registers for the set of scheduled instructions based on the data dependence, the checking constraint and/or the anti-checking constraint. In one embodiment, the optimizer is to release unused registers to reduce the alias registers used to protect the scheduled instructions. The optimizer is further to insert a dummy instruction after a fused instruction to break cycles in the checking and anti-checking constraints. | 10-11-2012 |
20130275700 | BI-DIRECTIONAL COPYING OF REGISTER CONTENT INTO SHADOW REGISTERS - Embodiments of the present disclosure describe a processor, which may include copy circuitry coupled to a shadow register file and a control register. The copy circuitry may be configured to copy content from a range of a number of registers to a shadow range of the shadow register file in a forward or backward direction. The forward or backward direction may be based at least in part on a value stored in the control register. | 10-17-2013 |
20130283014 | EXPEDITING EXECUTION TIME MEMORY ALIASING CHECKING - Embodiments of apparatus, computer-implemented methods, systems, and computer-readable media are described herein for expediting execution time memory alias checking. A sequence of instructions targeted for execution on an execution processor may be received or retrieved. The execution processor may include a plurality of alias registers and circuitry configured to check entries in the alias register for memory aliasing. One or more optimizations may be performed on the received or retrieved sequence of instructions to optimize execution performance of the received or retrieved sequence of instructions. This may include a reorder of a plurality of memory instructions in the received or retrieved sequence of instructions. After the optimization, one or more move instructions may be inserted in the optimized sequence of instructions to move one or more entries among the alias registers during execution, to expedite alias checking at execution time. Other embodiments may be described and/or claimed. | 10-24-2013 |
20130318507 | APPARATUS, METHOD, AND SYSTEM FOR PROVIDING A DECISION MECHANISM FOR CONDITIONAL COMMITS IN AN ATOMIC REGION - An apparatus and method is described herein for conditionally committing and/or speculative checkpointing transactions, which potentially results in dynamic resizing of transactions. During dynamic optimization of binary code, transactions are inserted to provide memory ordering safeguards, which enables a dynamic optimizer to more aggressively optimize code. And the conditional commit enables efficient execution of the dynamic optimization code, while attempting to prevent transactions from running out of hardware resources. While the speculative checkpoints enable quick and efficient recovery upon abort of a transaction. Processor hardware is adapted to support dynamic resizing of the transactions, such as including decoders that recognize a conditional commit instruction, a speculative checkpoint instruction, or both. And processor hardware is further adapted to perform operations to support conditional commit or speculative checkpointing in response to decoding such instructions. | 11-28-2013 |
20140095778 | METHODS, SYSTEMS AND APPARATUS TO CACHE CODE IN NON-VOLATILE MEMORY - Methods and apparatus are disclosed to cache code in non-volatile memory. A disclosed example method includes identifying an instance of a code request for first code, identifying whether the first code is stored on non-volatile (NV) random access memory (RAM) cache, and when the first code is absent from the NV RAM cache, adding the first code to the NV RAM cache when a first condition associated with the first code is met and preventing storage of the first code to the NV RAM cache when the first condition is not met. | 04-03-2014 |
20140096132 | FLEXIBLE ACCELERATION OF CODE EXECUTION - Technologies for performing flexible code acceleration on a computing device includes initializing an accelerator virtual device on the computing device. The computing device allocates memory-mapped input and output (I/O) for the accelerator virtual device and also allocates an accelerator virtual device context for a code to be accelerated. The computing device accesses a bytecode of the code to be accelerated and determines whether the bytecode is an operating system-dependent bytecode. If not, the computing device performs hardware acceleration of the bytecode via the memory-mapped I/O using an internal binary translation module. However, if the bytecode is operating system-dependent, the computing device performs software acceleration of the bytecode. | 04-03-2014 |
20140122845 | OVERLAPPING ATOMIC REGIONS IN A PROCESSOR - In one embodiment, the present invention includes a processor having a core to execute instructions. This core can include various structures and logic that enable instructions of different atomic regions to be executed in an overlapping manner. To this end, the core can include a register file having registers to store data for use in execution of the instructions, and multiple shadow register files each to store a register checkpoint on initiation of a given atomic region. In this way, overlapping execution of atomic regions identified by a programmer or compiler can occur. Other embodiments are described and claimed. | 05-01-2014 |
20140208085 | INSTRUCTION AND LOGIC TO EFFICIENTLY MONITOR LOOP TRIP COUNT - Logic and instruction to efficiently monitor loop trip count. Loop trip count information of a loop may be stored in a dedicated hardware buffer. Average loop trip count of the loop may be calculated based on the stored loop trip count information. Based on the average trip count, loop optimizations may be applied or removed from the loop. The stored loop trip count information may include an identifier identifying the loop, a total loop trip count of the loop, and an exit count of the loop. | 07-24-2014 |
20140223166 | DYNAMIC CORE SELECTION FOR HETEROGENEOUS MULTI-CORE SYSTEMS - Dynamically switching cores on a heterogeneous multi-core processing system may be performed by executing program code on a first processing core. Power up of a second processing core may be signaled. A first performance metric of the first processing core executing the program code may be collected. When the first performance metric is better than a previously determined core performance metric, power down of the second processing core may be signaled and execution of the program code may be continued on the first processing core. When the first performance metric is not better than the previously determined core performance metric, execution of the program code may be switched from the first processing core to the second processing core. | 08-07-2014 |
20150039861 | ALLOCATION OF ALIAS REGISTERS IN A PIPELINED SCHEDULE - In an embodiment, a system includes a processor including one or more cores and a plurality of alias registers to store memory range information associated with a plurality of operations of a loop. The memory range information references one or more memory locations within a memory. The system also includes register assignment means for assigning each of the alias registers to a corresponding operation of the loop, where the assignments are made according to a rotation schedule, and one of the alias registers is assigned to a first operation in a first iteration of the loop and to a second operation in a subsequent iteration of the loop. The system also includes the memory coupled to the processor. Other embodiments are described and claimed. | 02-05-2015 |