Entries |
Document | Title | Date |
20080209173 | Branch predictor directed prefetch - An apparatus for executing branch predictor directed prefetch operations. During operation, a branch prediction unit may provide an address of a first instruction to the fetch unit. The fetch unit may send a fetch request for the first instruction to the instruction cache to perform a fetch operation. In response to detecting a cache miss corresponding to the first instruction, the fetch unit may execute one or more prefetch operation while the cache miss corresponding to the first instruction is being serviced. The branch prediction unit may provide an address of a predicted next instruction in the instruction stream to the fetch unit. The fetch unit may send a prefetch request for the predicted next instruction to the instruction cache to execute the prefetch operation. The fetch unit may store prefetched instruction data obtained from a next level of memory in the instruction cache or in a prefetch buffer. | 08-28-2008 |
20080229069 | System, Method And Software To Preload Instructions From An Instruction Set Other Than One Currently Executing - An instruction preload instruction executed in a first processor instruction set operating mode is operative to correctly preload instructions in a different, second instruction set. The instructions are pre-decoded according to the second instruction set encoding in response to an instruction set preload indicator (ISPI). In various embodiments, the ISPI may be set prior to executing the preload instruction, or may comprise part of the preload instruction or the preload target address. | 09-18-2008 |
20080229070 | Cache circuitry, data processing apparatus and method for prefetching data - Cache circuitry, a data processing apparatus including such cache circuitry, and a method for prefetching data into such cache circuitry, are provided. The cache circuitry has a cache storage comprising a plurality of cache lines for storing data values, and control circuitry which is responsive to an access racquet issued by a device of the data processing apparatus identifying a memory address of a data value to be accessed, to cause a lookup operation to be performed to determine whether the data value for that memory address is stored within the cache storage. If not, a linefill operation is initiated to retrieve the data value from memory. Further, prefetch circuitry is provided which is responsive to a determination that the memory address specified by a current access request is the same as a predicted memory address, to perform either a first prefetch linefill operation or a second prefetch linefill operation to retrieve from memory at least one further data value in anticipation of that data value being the subject of a subsequent access request. The selection of either the first prefetch linefill operation or the second prefetch linefill operation is performed in dependence on an attribute of the current access request. The first prefetch linefill operation involves issuing a sequence of memory addresses to memory, and allocating into a corresponding sequence of cache lines the data values returned from the memory in response to that sequence of addresses. The second prefetch linefill operation comprises issuing a selected memory address to memory, and storing in a linefill buffer the at least one data value returned from the memory in response to that memory address, with that at least one data value only being allocated into the cache when a subsequent access request specifies the selected memory address. By such an approach, the operation of the prefetch circuitry can be altered to take into account the type of access request being issued. | 09-18-2008 |
20080229071 | PREFETCH CONTROL APPARATUS, STORAGE DEVICE SYSTEM AND PREFETCH CONTROL METHOD - A prefetch control apparatus includes a prefetch controller for controlling prefetch of read data into a cache memory caching data to be transferred between a computer apparatus and a storage device, and which enhances a read efficiency of the read data from the storage device, a sequentiality decider for deciding whether the read data that are read from the storage device toward the computer apparatus are sequential access data, a locality decider for deciding whether the read data have locality of data arrangement in the predetermined storage area, in a case where the read data that are read from the storage device toward the computer apparatus have been decided not to be sequential access data, and a prefetcher for prefetching the read data in a case where the read data has the locality of the data arrangement. | 09-18-2008 |
20080229072 | PREFETCH PROCESSING APPARATUS, PREFETCH PROCESSING METHOD, STORAGE MEDIUM STORING PREFETCH PROCESSING PROGRAM - A prefetch processing apparatus includes a central-processing-unit monitor unit that monitors processing states of the central processing unit in association with time elapsed from start time of executing a program. A cache-miss-data address obtaining unit obtains cache-miss-data addresses in association with the time elapsed from the start time of executing the program, and a cycle determining unit determines a cycle of time required for executing the program. An identifying unit identifies a prefetch position in a cycle in which a prefetch-target address is to be prefetched by associating the cycle determined by the cycle determining unit with the cache-miss data addresses obtained by the cache-miss-data address obtaining unit. The prefetch-target address is an address of data on which prefetch processing is to be performed. | 09-18-2008 |
20080244231 | Method and apparatus for speculative prefetching in a multi-processor/multi-core message-passing machine - In some embodiments, the invention involves a novel combination of techniques for prefetching data and passing messages between and among cores in a multi-processor/multi-core platform. In an embodiment, a receiving core has a message queue and a message prefetcher. Incoming messages are simultaneously written to the message queue and the message prefetcher. The prefetcher speculatively fetches data referenced in the received message so that the data is available when the message is executed in the execution pipeline, or shortly thereafter. Other embodiments are described and claimed. | 10-02-2008 |
20080244232 | Pre-fetch apparatus - Apparatus and computing systems associated with data pre-fetching are described. One embodiment includes a processor that includes a first unit to store data corresponding to a load instruction and an instruction pointer (IP) value associated with the load instruction. The processor also includes a second unit to produce a predicted demand address for a next load instruction, the predicted demand address being based on a constant stride value. The processor also includes a third unit to generate an instruction pointer pre-fetch (IPP) request for the predicted demand address. The processor may also include units to arbitrate between generated IP pre-fetch requests and alternative pre-fetch requests. | 10-02-2008 |
20080276073 | Apparatus for and method of distributing instructions - An apparatus is provided for buffering instructions. An instruction store has memory locations for storing instructions. Each instruction can be associated with a timer such that an instruction dispatcher causes the instruction to be sent when the timer indicates that the instruction should be sent. | 11-06-2008 |
20080288751 | TECHNIQUE FOR PREFETCHING DATA BASED ON A STRIDE PATTERN - A processor system ( | 11-20-2008 |
20080307202 | LOADING TEST DATA INTO EXECUTION UNITS IN A GRAPHICS CARD TO TEST THE EXECUTION UNITS - Provided are a method and system for loading test data into execution units in a graphics card to test the execution units. Test instructions are loaded into a cache in a graphics module comprising multiple execution units coupled to the cache on a bus during a design test mode. The cache instructions are concurrently transferred to an instruction queue of each execution unit to concurrently load the cache instructions into the instruction queues of the execution units. The execution units concurrently execute the cache instructions to fetch test instructions from the cache to load into memories of the execution units and execute during the design test mode. | 12-11-2008 |
20090006813 | DATA FORWARDING FROM SYSTEM MEMORY-SIDE PREFETCHER - An apparatus, system, and method are disclosed. In one embodiment, the apparatus includes a system memory-side prefetcher that is coupled to a memory controller. The system memory-side prefetcher includes a stride detection unit to identify one or more patterns in a stream. The system memory-side prefetcher also includes a prefetch injection unit to insert prefetches into the memory controller based on the detected one or more patterns. The system memory-side prefetcher also includes a prefetch data forwarding unit to forward the prefetched data to a cache memory coupled to a processor. | 01-01-2009 |
20090019260 | MASS PREFETCHING METHOD FOR DISK ARRAY - Disclosed herein is a mass prefetching method for disk arrays. In order to improve disk read performance for a non-sequential with having spatial locality as well as a sequential read, when a host requests a block to be read, all the blocks of the strip to which the block belongs are read. This is designated as strip prefetching (SP). Throttled Strip Prefetching (TSP), proposed in the present invention, investigates whether SP is beneficial by an online disk simulation, and does not perform SP if it is determined that SP is not beneficial. Since all prefetching operations of TSP are aligned in the strip of the disk array, the disk independence loss is resolved, and thus the performance of disk arrays is improved for concurrent sequential reads of multiple processes. TSP may however suffer from the loss of disk parallelism due to the disk independence of SP for a single sequential read. In order to solve this problem, this invention proposes Massive Stripe Prefetching (MSP). MSP includes an algorithm that detects a single sequential read at the block level. When a single sequential read is detected, prefetching is aligned in a stripe, and the prefetching size is set to a multiple of stripe size. Accordingly, the parallelism of disks is maximized. | 01-15-2009 |
20090019261 | High-Performance, Superscalar-Based Computer System with Out-of-Order Instruction Execution - A high-performance, superscalar-based computer system with out-of-order instruction execution for enhanced resource utilization and performance throughput. The computer system fetches a plurality of fixed length instructions with a specified, sequential program order (in-order). The computer system includes an instruction execution unit including a register file, a plurality of functional units, and an instruction control unit for examining the instructions and scheduling the instructions for out-of-order execution by the functional units. The register file includes a set of temporary data registers that are utilized by the instruction execution control unit to receive data results generated by the functional units. The data results of each executed instruction are stored in the temporary data registers until all prior instructions have been executed, thereby retiring the executed instruction in-order. | 01-15-2009 |
20090024835 | SPECULATIVE MEMORY PREFETCH - A system and method for pre-fetching data from system memory. A multi-core processor accesses a cache hit predictor concurrently with sending a memory request to a cache subsystem. The predictor has two tables. The first table is indexed by a portion of a memory address and provides a hit prediction based on a first counter value. The second table is indexed by a core number and provides a hit prediction based on a second counter value. If neither table predicts a hit, a pre-fetch request is sent to memory. In response to detecting said hit prediction is incorrect, the pre-fetch is cancelled. | 01-22-2009 |
20090031111 | METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR DYNAMICALLY SELECTING COMPILED INSTRUCTIONS - A method, apparatus, and computer program product dynamically select compiled instructions for execution. Static instructions for execution on a first execution and dynamic instructions for execution on a second execution unit are received. The throughput performance of the static instructions and the dynamic instructions is evaluated based on current states of the execution units. The static instructions or the dynamic instructions are selected for execution at runtime on the first execution unit or the second execution unit, respectively, based on the throughput performance of the instructions. | 01-29-2009 |
20090031112 | SYSTEM AND METHOD FOR PROVIDING GLOBAL VARIABLES WITHIN A HUMAN CAPITAL MANAGEMENT SYSTEM - A system and method are provided for stacking global variables associated with a plurality of tools. The method includes loading a first tool global variable into a memory and executing a first tool of a computer application, the computer application configured to automate human resource processes. The method includes responsive to a call to execute a second tool of the computer application, pushing the first tool global variable onto a stack. The method includes loading a second tool global variable into the memory and executing the second tool. The method includes responsive to completing execution of the second tool, popping the first tool global variable off the stack and loading the first tool global variable back into the memory. | 01-29-2009 |
20090049277 | SEMICONDUCTOR INTEGRATED CIRCUIT DEVICE - A semiconductor integrated circuit device is provided. The operating frequency generating component generates an operating frequency that is a timing that becomes a reference for synchronizing processing between each circuit when the semiconductor integrated circuit operates. The extracting component extracts a critical path that is the slowest path when a data signal propagates between predetermined terminals inside the semiconductor integrated circuit. The instruction prefetch executing component prefetches an instruction relating to the critical path that has been extracted by the extracting component. The processing configuration changing component changes the processing configuration so as to realize transmission of the data signal within a predetermined cycle of the operating frequency using the instruction prefetch executing component when the data signal passes through the path of the critical path. | 02-19-2009 |
20090055628 | METHODS AND COMPUTER PROGRAM PRODUCTS FOR REDUCING LOAD-HIT-STORE DELAYS BY ASSIGNING MEMORY FETCH UNITS TO CANDIDATE VARIABLES - Assigning each of a plurality of memory fetch units to any of a plurality of candidate variables to reduce load-hit-store delays, wherein a total number of required memory fetch units is minimized. A plurality of store/load pairs are identified. A dependency graph is generated by creating a node Nx for each store to variable X and a node Ny for each load of variable Y and, unless X=Y, for each store/load pair, creating an edge between a respective node Nx and a corresponding node Ny; for each created edge, labeling the edge with a heuristic weight; labeling each node Nx with a node weight Wx that combines a plurality of respective edge weights of a plurality of corresponding nodes Nx such that Wx=Σω | 02-26-2009 |
20090070556 | STORE STREAM PREFETCHING IN A MICROPROCESSOR - In a microprocessor having a load/store unit and prefetch hardware, the prefetch hardware includes a prefetch queue containing entries indicative of allocated data streams. A prefetch engine receives an address associated with a store instruction executed by the load/store unit. The prefetch engine determines whether to allocate an entry in the prefetch queue corresponding to the store instruction by comparing entries in the queue to a window of addresses encompassing multiple cache blocks, where the window of addresses is derived from the received address. The prefetch engine compares entries in the prefetch queue to a window of 2 | 03-12-2009 |
20090077350 | DATA PROCESSING SYSTEM AND METHOD - A method, system and computer program for modifying an executing application, comprising monitoring the executing application to identify at least one of a hot load instruction, a hot store instruction and an active prefetch instruction that contributes to cache congestion; where the monitoring identifies a hot load instruction, enabling at least one prefetch associated with the hot load instruction; where the monitoring identifies a hot store instruction, enabling at least one prefetch associated with the hot store instruction; and where the monitoring identifies an active prefetch instruction that contributes to cache congestion, one of disabling the active prefetch instruction and reducing the effectiveness of the active prefetch instructions. | 03-19-2009 |
20090089548 | METHOD FOR PRELOADING DATA IN A CPU PIPELINE - A method for preloading data in a CPU pipeline is provided, which includes the following steps. When a hint instruction is executed, allocate and initiate an entry in a preload table. When a load instruction is fetched, load a piece of data from a memory into the entry according to the entry. When a use instruction which uses the data loaded by the load instruction is executed, forward the data for the use instruction from the entry instead of from the memory. When the load instruction is executed, update the entry according to the load instruction. | 04-02-2009 |
20090094440 | PRE-FETCH CIRCUIT OF SEMICONDUCTOR MEMORY APPARATUS AND CONTROL METHOD OF THE SAME - A pre-fetch circuit of a semiconductor memory apparatus can carry out a high-frequency operating test through a low-frequency channel of a test equipment. The pre-fetch circuit of a semiconductor memory apparatus can includes: a pre-fetch unit for pre-fetching data bits in a first predetermined number; a plurality of registers provided in the first predetermined number, each of which latches a data in order or a data out of order of the pre-fetched data in response to different control signals; and a control unit for selectively activating the different control signals in response to a test mode signal, whereby some of the registers latch the data out of order. | 04-09-2009 |
20090119488 | Prefetch Unit - In one embodiment, a processor comprises a prefetch unit coupled to a data cache. The prefetch unit is configured to concurrently maintain a plurality of separate, active prefetch streams. Each prefetch stream is either software initiated via execution by the processor of a dedicated prefetch instruction or hardware initiated via detection of a data cache miss by one or more load/store memory operations. The prefetch unit is further configured to generate prefetch requests responsive to the plurality of prefetch streams to prefetch data in to the data cache. | 05-07-2009 |
20090198964 | METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR OUT OF ORDER INSTRUCTION ADDRESS STRIDE PREFETCH PERFORMANCE VERIFICATION - A method, system, and computer program product are provided for verifying out of order instruction address (IA) stride prefetch performance in a processor design having more than one level of cache hierarchies. Multiple instruction streams are generated and the instructions loop back to corresponding instruction addresses. The multiple instruction streams are dispatched to a processor and simulation application to process. When a particular instruction is being dispatched, the particular instruction's instruction address and operand address are recorded in the queue. The processor is monitored to determine if the processor executes fetch and prefetch commands in accordance with the simulation application. It is checked to determine if prefetch commands are issued for instructions having three or more strides. | 08-06-2009 |
20090198965 | METHOD AND SYSTEM FOR SOURCING DIFFERING AMOUNTS OF PREFETCH DATA IN RESPONSE TO DATA PREFETCH REQUESTS - According to a method of data processing, a memory controller receives a prefetch load request from a processor core of a data processing system. The prefetch load request specifies a requested line of data. In response to receipt of the prefetch load request, the memory controller determines by reference to a stream of demand requests how much data is to be supplied to the processor core in response to the prefetch load request. In response to the memory controller determining to provide less than all of the requested line of data, the memory controller provides less than all of the requested line of data to the processor core. | 08-06-2009 |
20090204791 | Compound Instruction Group Formation and Execution - A method and apparatus for forming compound issue groups containing instructions from multiple cache lines of instructions are provided. By pre-fetching instruction lines containing instructions targeted by a conditional branch statement, if it is predicted that the conditional branch will be taken, a compound issue group may be formed with instructions from the I-line containing the branch statement and the I-line containing instructions targeted by the branch. | 08-13-2009 |
20090210662 | MICROPROCESSOR, METHOD AND COMPUTER PROGRAM PRODUCT FOR DIRECT PAGE PREFETCH IN MILLICODE CAPABLE COMPUTER SYSTEM - A microprocessor equipped to provide hardware initiated prefetching, includes at least one architecture for performing: issuance of a prefetch instruction; writing of a prefetch address into a prefetch fetch address register (PFAR); attempting a prefetch according to the address; detecting one of a cache miss and a cache hit; and if there is a cache miss, then sending a miss request to a next cache level and attempting cache access in a non-busy cycle; and if there is a cache hit, then incrementing the address in the PFAR and completing the prefetch. A method and a computer program product are provided. | 08-20-2009 |
20090210663 | Power Efficient Instruction Prefetch Mechanism - A processor includes a conditional branch instruction prediction mechanism that generates weighted branch prediction values. For weakly weighted predictions, which tend to be less accurate than strongly weighted predictions, the power associating with speculatively filling and subsequently flushing the cache is saved by halting instruction prefetching. Instruction fetching continues when the branch condition is evaluated in the pipeline and the actual next address is known. Alternatively, prefetching may continue out of a cache. To avoid displacing good cache data with instructions prefetched based on a mispredicted branch, prefetching may be halted in response to a weakly weighted prediction in the event of a cache miss. | 08-20-2009 |
20090217004 | CACHE WITH PREFETCH - A prefetch bit ( | 08-27-2009 |
20090235053 | System and Method for Register Renaming - A system and method for performing register renaming of source registers in a processor having a variable advance instruction window for storing a group of instructions to be executed by the processor, wherein a new instruction is added to the variable advance instruction window when a location becomes available. A tag is assigned to each instruction in the variable advance instruction window. The tag of each instruction to leave the window is assigned to the next new instruction to be added to it. The results of instructions executed by the processor are stored in a temp buffer according to their corresponding tags to avoid output and anti-dependencies. The temp buffer therefore permits the processor to execute instructions out of order and in parallel. Data dependency checks for input dependencies are performed only for each new instruction added to the variable advance instruction window and register renaming is performed to avoid input dependencies. | 09-17-2009 |
20090313455 | Instruction issue control wtihin a multithreaded processor - A multithreaded processor is provided with a saturating counter which serves to generate a thread preference signal to steer selection of which program thread operations are taken from for issue into the multiple processor pipelines. The counter is updated based upon the selections made for issue. The counter is a saturating counter and its sign bit may be used as a thread preference signal when discriminating between two threads. The update made to the count value can be weighted depending upon programmable priorities associated with the respective threads as well as a weighting based upon the time taken to execute the type of operation selected. | 12-17-2009 |
20090313456 | METHODS AND APPARATUS FOR DYNAMIC PREDICTION BY SOFTWARE - A method, storage medium, processor instruction and processor to for specifying a value in a first portion of a conditional pre-fetch instruction associated with a branch instruction used for effectuating a branch operation, specifying a target instruction address in a second portion of the instruction, evaluating the value to determine whether a condition is met, and pre-fetching one or more instructions starting at the target instruction address into an instruction buffer of the processor when the condition is met, is provided. | 12-17-2009 |
20100049947 | PROCESSOR AND EARLY-LOAD METHOD THEREOF - A processor and an early-load method thereof are provided. In the early-load method, an instruction is fetched and determined in an instruction fetch stage to obtain a determination result. Whether to early-load an early-loaded data corresponding to the instruction is determined according to the determination result. A target data is fetched according to the instruction in an instruction execution stage if the early-loaded data is not loaded correctly. The early-loaded data is served as the target data if the early-loaded data is loaded correctly. | 02-25-2010 |
20100082948 | CHANNEL COMMAND WORD PRE-FETCHING APPARATUS - In a CCW fetching section, for each input/output device being a control objective, a result prediction table in which prediction values of status values to be returned from an input/output device as execution results of CCW commands, is referred to. Then, based on the prediction values, commands being pre-fetching objectives are pre-fetched from a CCW program stored in a memory, and transmitted to a CCW executing section. On the other hand, in the CCW executing section, the pre-fetched commands are sequentially executed, and the actual status values as the execution results are received from the input/output device. Then, when the received actual status values are not same as the predicted status values, success or failure in prediction is notified to the CCW fetching section, and also, the result prediction table is updated in the CCW fetching section. | 04-01-2010 |
20100153689 | Processor instruction used to determine whether to perform a memory-related trap - Instruction execution includes fetching an instruction that comprises a first set of one or more bits identifying the instruction, and a second set of one or more bits associated with a first address value. It further includes executing the instruction to determine whether to perform a trap, wherein executing the instruction includes selecting from a plurality of tests at least one test for determining whether to perform a trap and carrying out the at least one test. The second set of one or more bits is used in the determination of whether to perform the trap; and the plurality of tests includes a matrix test that determines whether a data value being stored as pointed to by the first address value is escaping from one of a plurality of managed memory types to another one of the plurality of managed memory types and generates a trap in the event that the data value is determined to be escaping from one of the plurality of managed memory types to another one of the plurality of managed memory types, wherein the matrix test is based on a matrix associated with garbage collection and a matrix entry located using at least some of the first set of one or more bits and at least some of the second set of one or more bits. | 06-17-2010 |
20100287357 | Computer Memory Architecture for Hybrid Serial and Parallel Computing Systems - In one embodiment, a serial processor is configured to execute software instructions in a software program in serial. A serial memory is configured to store data for use by the serial processor in executing the software instructions in serial. A plurality of parallel processors are configured to execute software instructions in the software program in parallel. A plurality of partitioned memory modules are provided and configured to store data for use by the plurality of parallel processors in executing software instructions in parallel. Accordingly, a processor/memory structure is provided that allows serial programs to use quick local serial memories and parallel programs to use partitioned parallel memories. The system may switch between a serial mode and a parallel mode. The system may incorporate pre-fetching commands of several varieties. For example, towards switching between the serial mode and the parallel mode, the serial processor is configured to send a signal to start pre-fetching of data from the shared memory. | 11-11-2010 |
20100306503 | GUARANTEED PREFETCH INSTRUCTION - A microprocessor includes a cache memory, an instruction set having first and second prefetch instructions each configured to instruct the microprocessor to prefetch a cache line of data from a system memory into the cache memory, and a memory subsystem configured to execute the first and second prefetch instructions. For the first prefetch instruction the memory subsystem is configured to forego prefetching the cache line of data from the system memory into the cache memory in response to a predetermined set of conditions. For the second prefetch instruction the memory subsystem is configured to complete prefetching the cache line of data from the system memory into the cache memory in response to the predetermined set of conditions. | 12-02-2010 |
20100332800 | Instruction control device, instruction control method, and processor - An instruction control device connects to a cache memory that stores data frequently used among data stored in a main memory. The instruction control device includes: a first free-space determining unit that determines whether there is free space in an instruction buffer; a second free-space determining unit that manages an instruction fetch request queue that stores an instruction fetch data to be sent from the cache memory to the main memory, and determines whether a move-in buffer in the cache memory has free space for at least two entries if the first free-space determining unit determines that there is free space; and an instruction control unit that outputs an instruction prefetch request to the cache memory in accordance with an address boundary corresponding to a line size of the cache line, if the second free-space determining unit determines that the move-in buffer has free space. | 12-30-2010 |
20110078413 | ARITHMETIC PROCESSING UNIT, SEMICONDUCTOR INTEGRATED CIRCUIT, AND ARITHMETIC PROCESSING METHOD - An arithmetic processing apparatus includes an arithmetic circuit; a first memory configured to store data to be processed in the arithmetic circuit; a second memory configured to be accessed through a first path by the arithmetic circuit; a preloader configured to preload the data from the second memory into the first memory through a second path; a memory controller configured to arbitrate between a first access by the arithmetic circuit using the first path and a second access by the preloader using the second path; and a scheduler configured to control the memory controller. | 03-31-2011 |
20110153987 | REVERSE SIMULTANEOUS MULTI-THREADING - A multi-core processor system supporting simultaneous thread sharing across execution resources of multiple processor cores is provided. The multi-core processor system includes a first processor core with a first instruction queue and dispatch logic in communication with a first execution resource of the first processor core. The multi-core processor system also includes a second processor core with a second instruction queue and dispatch logic in communication with a second execution resource of the second processor core. A high-speed execution resource bus couples the first and second processor cores. The first instruction queue and dispatch logic is configured to issue a first instruction of a thread to the first execution resource and issue a second instruction of the thread over the high-speed execution resource bus to the second execution resource for simultaneous execution of the first and second instruction of the thread on the first and second processor cores. | 06-23-2011 |
20110153988 | METHODS AND APPARATUS TO PERFORM ADAPTIVE PRE-FETCH OPERATIONS IN MANAGED RUNTIME ENVIRONMENTS - Methods and apparatus to perform adaptive pre-fetch operations in managed runtime environments are disclosed herein. An example pre-fetch unit for use with a pre-fetch operation includes a first size function executed by a processor to determine a size of an object associated with a pre-fetch operation; a comparator to compare the size of the object associated with the pre-fetch operation to a first one of a plurality of thresholds; and a fetcher to pre-fetch an amount of memory defined by a first one of a plurality of size definitions corresponding to the first threshold when the comparator determines that the size of the object is less than the first threshold. | 06-23-2011 |
20110161631 | Arithmetic processing unit, information processing device, and control method - According to an aspect of an embodiment of the invention, an arithmetic processing unit includes a first cache memory unit that holds a part of data stored in a storage device; an address register that holds an address; a flag register that stores flag information; and a decoder that decodes a prefetch instruction for acquiring data stored at the address in the storage device. The arithmetic processing unit further includes an instruction execution unit that executes a cache hit check instruction instead of the prefetch instruction on the basis of a decoded result when the flag information is held, the cache hit check instruction allowing for searching the first cache memory unit with the address to thereby make a first cache hit determination that the first cache memory unit holds the data stored at the address in the storage device. | 06-30-2011 |
20110167243 | SPACE-EFFICIENT MECHANISM TO SUPPORT ADDITIONAL SCOUTING IN A PROCESSOR USING CHECKPOINTS - Techniques and structures are disclosed for a processor supporting checkpointing to operate effectively in scouting mode while a maximum number of supported checkpoints are active. Operation in scouting mode may include using bypass logic and a set of register storage locations to store and/or forward in-flight instruction results that were calculated during scouting mode. These forwarded results may be used during scouting mode to calculate memory load addresses for yet other in-flight instructions, and the processor may accordingly cause data to be prefetched from these calculated memory load addresses. The set of register storage locations may comprise a working register file or an active portion of a multiported register file. | 07-07-2011 |
20110173417 | Programming Idiom Accelerators - A wake-and-go mechanism may be a programming idiom accelerator. As a processor fetches instructions, the programming idiom accelerator may look ahead to determine whether a programming idiom is coming up in the instruction stream. If the programming idiom accelerator recognizes a programming idiom, the programming idiom accelerator may perform an action to accelerate execution of the programming idiom. In the case of a wake-and-go programming idiom, the programming idiom accelerator may record an entry in a wake-and-go array, for example. | 07-14-2011 |
20110179255 | Data processing reset operations - A processor | 07-21-2011 |
20110213951 | Storing Branch Information in an Address Table of a Processor - Methods for storing branch information in an address table of a processor are disclosed. A processor of the disclosed embodiments may generally include an instruction fetch unit connected to an instruction cache, a branch execution unit, and an address table being connected to the instruction fetch unit and the branch execution unit. The address table may generally be adapted to store a plurality of entries with each entry of the address table being adapted to store a base address and a base instruction tag. In a further embodiment, the branch execution unit may be adapted to determine the address of a branch instruction having an instruction tag based on the base address and the base instruction tag of an entry of the address table associated with the instruction tag. In some embodiments, the address table may further be adapted to store branch information. | 09-01-2011 |
20110238953 | INSTRUCTION FETCH APPARATUS AND PROCESSOR - An instruction fetch apparatus is disclosed which includes: a detection state setting section configured to set the execution state of a program of which an instruction prefetch timing is to be detected; a program execution state generation section configured to generate the current execution state of the program; an instruction prefetch timing detection section configured to detect the instruction prefetch timing in the case of a match between the current execution state of the program and the set execution state thereof upon comparison therebetween; and an instruction prefetch section configured to prefetch the next instruction upon detection of the instruction prefetch timing. | 09-29-2011 |
20110264894 | BRANCHING PROCESSING METHOD AND SYSTEM - A method is provided for controlling a pipeline operation of a processor. The processor is coupled to a memory containing executable computer instructions. The method includes determining a branch instruction to be executed by the processor, and providing both an address of a branch target instruction of the branch instruction and an address of a next instruction following the branch instruction in a program sequence. The method also includes determining a branch decision with respect to the branch instruction based on at least the address of the branch target instruction provided, and selecting at least one of the branch target instruction and the next instruction as a proper instruction to be executed by an execution unit of the processor, based on the branch decision and before the branch instruction is executed by the execution unit, such that the pipeline operation is not stalled whether or not a branch is taken with respect to the branch instruction. | 10-27-2011 |
20110276786 | Shared Prefetching to Reduce Execution Skew in Multi-Threaded Systems - Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated based on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads. | 11-10-2011 |
20110314261 | Prefetch of Attributes in Evaluating Access Control Requests - In an embodiment, a method is provided for prefetching attributes used in access control evaluation. In this method, an access control policy that comprises rules is retrieved. These rules further comprise parameters. At least one of the rules is categorized into a class from multiple classes based on at least one of the parameters. Here, the class is a grouping based on at least one of these parameters. An attribute associated with the at least one of these parameters is identified and this attribute is mapped to the class. | 12-22-2011 |
20110314262 | PREFETCH REQUEST CIRCUIT - A prefetch request circuit is provided in a processor device. The processor device has hierarchized storage areas and can prefetch data of address to be used between appropriate storage areas among the storage areas, when executing respective instruction flows obtained by multi-flow expansion for one instruction at a time of decoding of the instruction. The prefetch request circuit includes a latch unit to hold, when a state in which the respective instruction flows to access the storage area are executed with a maximum specifiable data transfer volume is specified, the state during a time period of the multi-flow expansion; and a prefetch request signal output unit to output a prefetch request signal to request the prefetch every time when the instruction flow is executed, based on an output signal of the latch unit and a signal indicating an execution timing of the respective instruction flows. | 12-22-2011 |
20120005457 | USING SOFTWARE-CONTROLLED SMT PRIORITY TO OPTIMIZE DATA PREFETCH WITH ASSIST THREAD - A method for optimizing data prefetch using assist threads is disclosed herein. In one embodiment, such a method includes executing a main application thread substantially simultaneously with an assist thread. The assist thread is configured to prefetch data for the main application thread. The method further includes monitoring, at runtime, the progress of the main application thread and the assist thread. Depending on the progress of the main application thread and the assist thread, the method dynamically adjusts, at runtime, the priority of the main application thread, the priority of the assist thread, or both. This will help to ensure that the progress of the main application thread and the assist thread are substantially synchronized while executing so that the assist thread increases the performance of the main application thread as initially intended. A corresponding computer program product, apparatus, and system and also disclosed herein. | 01-05-2012 |
20120072702 | PREFETCHER WITH ARBITRARY DOWNSTREAM PREFETCH CANCELATION - A prefetch cancelation arbiter improves access to a shared memory resource by arbitrarily canceling speculative prefetches. The prefetch cancelation arbiter applies a set of arbitrary policies to speculative prefetches to select one or more of the received speculative prefetches to cancel. The selected speculative prefetches are canceled and a cancelation notification of each canceled speculative prefetch is sent to a higher-level memory component such as a prefetch unit or a local memory arbiter that is local to the processor associated with the canceled speculative prefetch. The set of arbitrary policies is used to reduce memory accesses to the shared memory resource. | 03-22-2012 |
20120079242 | PROCESSOR POWER MANAGEMENT BASED ON CLASS AND CONTENT OF INSTRUCTIONS - A processor and method are disclosed. In one embodiment the processor includes a prefetch buffer that stores macro instructions. The processor also includes a clock circuit that can provide a clock signal for at least some of the functional units within the processor. The processor additionally includes macro instruction decode logic that can determine a class of each macro instruction. The processor also includes a clock management unit that can cause the clock signal to remain in a steady state entering at least one of the units in the processor that do not operate on a current macro instruction being decoded. Finally, the processor also includes at least one instruction decoder unit that can decode the first macro instruction into one or more opcodes. | 03-29-2012 |
20120079243 | Next-instruction-type-field - A graphics processing unit core | 03-29-2012 |
20120096240 | Application Performance with Support for Re-Initiating Unconfirmed Software-Initiated Threads in Hardware - A method, system and computer-usable medium are disclosed for managing prefetch streams in a virtual machine environment. Compiled application code in a first core, which comprises a Special Purpose Register (SPR) and a plurality of first prefetch engines, initiates a prefetch stream request. If the prefetch stream request cannot be initiated due to unavailability of a first prefetch engine, then an indicator bit indicating a Prefetch Stream Dispatch Fault is set in the SPR, causing a Hypervisor to interrupt the execution of the prefetch stream request. The Hypervisor then calls its associated operating system (OS), which determines prefetch engine availability for a second core comprising a plurality of second prefetch engines. If a second prefetch engine is available, then the OS migrates the prefetch stream request from the first core to the second core, where it is initiated on an available second prefetch engine. | 04-19-2012 |
20120096241 | Performance of Emerging Applications in a Virtualized Environment Using Transient Instruction Streams - A method, system and computer-usable medium are disclosed for managing transient instruction streams. Transient flags are defined in Branch-and-Link (BRL) instructions that are known to be infrequently executed. A bit is likewise set in a Special Purpose Register (SPR) of the hardware (e.g., a core) that is executing an instruction request thread. Subsequent fetches or prefetches in the request thread are treated as transient and are not written to lower-level caches. If an instruction is non-transient, and if a lower-level cache is non-inclusive of the L1 instruction cache, a fetch or prefetch miss that is obtained from memory may be written in both the L1 and the lower-level cache. If it is not inclusive, a cast-out from the L1 instruction cache may be written in the lower-level cache. | 04-19-2012 |
20120124336 | SIGNAL PROCESSING SYSTEM AND INTEGRATED CIRCUIT COMPRISING A PREFETCH MODULE AND METHOD THEREFOR - A signal processing system comprising at least one master device at least one memory element and prefetch module arranged to perform prefetching from at least one memory element upon a memory access request to the at least one memory element from the at least one master device. Upon receiving a memory access request from the at least one master device, the prefetch module is arranged to configure the enabling of prefetching of at least one of instruction information and data information in relation to that memory access request based at least partly on an address to which the memory access request relates. | 05-17-2012 |
20120131311 | CORRELATION-BASED INSTRUCTION PREFETCHING - The disclosed embodiments provide a system that facilitates prefetching an instruction cache line in a processor. During execution of the processor, the system performs a current instruction cache access which is directed to a current cache line. If the current instruction cache access causes a cache miss or is a first demand fetch for a previously prefetched cache line, the system determines whether the current instruction cache access is discontinuous with a preceding instruction cache access. If so, the system completes the current instruction cache access by performing a cache access to service the cache miss or the first demand fetch, and also prefetching a predicted cache line associated with a discontinuous instruction cache access which is predicted to follow the current instruction cache access. | 05-24-2012 |
20120159126 | Programming Language Exposing Idiom Calls - A programming language may include hint instructions that may notify a programming idiom accelerator that a programming idiom is coming. An idiom begin hint exposes the programming idiom to the programming idiom accelerator. Thus, the programming idiom accelerator need not perform pattern matching or other forms of analysis to recognize a sequence of instructions. Rather, the programmer may insert idiom hint instructions, such as an idiom begin hint, to expose the idiom to the programming idiom accelerator. Similarly, an idiom end hint may mark the end of the programming idiom. | 06-21-2012 |
20120173850 | INFORMATION PROCESSING APPARATUS - A high-performance information processing technique permitting updating of an instruction buffer ready for effective prefetching to branch instructions and returning to the subroutine with a small volume of hardware is to be provided at low cost. It is an information processing apparatus equipped with a CPU, a memory, prefetch means and the like, wherein a prefetch address generator unit in the prefetch means decodes a branching series of instructions including at least one branched address calculating instruction and branching instruction to a branched address out of a current instruction buffer storing the series of instructions currently accessed by the CPU, and thereby looks ahead to the branching destination address. The information processing apparatus further comprises a RTS instruction buffer for storing a series of instructions of the return destinations of RTS instructions, and series of instructions stored in the current instruction buffer are saved into the RTS instruction buffer. | 07-05-2012 |
20120226892 | Method and apparatus for generating efficient code for scout thread to prefetch data values for a main thread - One embodiment of the present invention provides a system that generates code for a scout thread to prefetch data values for a main thread. During operation, the system compiles source code for a program to produce executable code for the program. This compilation process involves performing reuse analysis to identify prefetch candidates which are likely to be touched during execution of the program. Additionally, this compilation process produces executable code for the scout thread which contains prefetch instructions to prefetch the identified prefetch candidates for the main thread. In this way, the scout thread can subsequently be executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread. | 09-06-2012 |
20120233442 | RETURN ADDRESS PREDICTION IN MULTITHREADED PROCESSORS - Techniques and structures are disclosed relating to predicting return addresses in multithreaded processors. In one embodiment, a processor is disclosed that includes a return address prediction unit. The return address prediction unit is configured to store return addresses for different ones of a plurality of threads executable on the processor. The return address prediction unit is configured to receive a request for a predicted return address for one of the plurality of threads. The first request includes an identification of the requesting thread. The return address prediction unit is configured to provide the predicted return address to the requesting thread. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has a plurality of dedicated portions. In some embodiments, the return address prediction unit is configured to store the return addresses in a memory that has dynamically allocable entries. | 09-13-2012 |
20120297169 | DATA PROCESSING APPARATUS, CONTROL METHOD THEREFOR, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM - A data processing apparatus which sequentially executes a verification process so as to recognize a target object, comprising: an obtaining unit configured to obtain dictionary data to be referred to in the verification process; a holding unit configured to hold a plurality of dictionary data; a verification unit configured to execute the verification process for the input data by referring to one dictionary data; a history holding unit configured to hold a verification result; and a prefetch determination unit configured to determine based on the verification result whether to execute prefetch processing in which the obtaining unit obtains in advance dictionary data to be referred to by the verification unit in a succeeding verification process, and holds the dictionary data in the holding unit before the succeeding verification process. | 11-22-2012 |
20130124828 | REDUCING HARDWARE COSTS FOR SUPPORTING MISS LOOKAHEAD - The disclosed embodiments relate to a system that executes program instructions on a processor. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. When an instruction retires during the lookahead mode, a working register which serves as a destination register for the instruction is not copied to a corresponding architectural register. Instead the architectural register is marked as invalid. Note that by not updating architectural registers during lookahead mode, the system eliminates the need to checkpoint the architectural registers prior to entering lookahead mode. | 05-16-2013 |
20130124829 | REDUCING POWER CONSUMPTION AND RESOURCE UTILIZATION DURING MISS LOOKAHEAD - The disclosed embodiments relate to a system that executes program instructions on a processor. During a normal-execution mode, the system issues instructions for execution in program order. Upon encountering an unresolved data dependency during execution of an instruction, the system speculatively executes subsequent instructions in a lookahead mode to prefetch future loads. While executing in the lookahead mode, if the processor determines that the lookahead mode is unlikely to uncover any additional outer-level cache misses, the system terminates the lookahead mode. Then, after the unresolved data dependency is resolved, the system recommences execution in the normal-execution mode from the instruction that triggered the lookahead mode. | 05-16-2013 |
20130179663 | INFORMATION HANDLING SYSTEM INCLUDING HARDWARE AND SOFTWARE PREFETCH - A prefetch optimizer tool for an information handling system (IHS) may improve effective memory access time by controlling both hardware prefetch operations and software prefetch operations. The prefetch optimizer tool selectively disables prefetch instructions in an instruction sequence of interest within an application. The tool measures execution times of the instruction sequence of interest when different prefetch instructions are disabled. The tool may hold hardware prefetch depth constant while cycling through disabling different prefetch instructions and taking corresponding execution time measurements. Alternatively, for each disabled prefetch instruction in the instruction sequence of interest, the tool may cycle through different hardware prefetch depths and take corresponding execution time measurements at each hardware prefetch depth. The tool selects a combination of hardware prefetch depth and prefetch instruction disablement that may improve the execution time in comparison with a baseline execution time. | 07-11-2013 |
20130262826 | APPARATUS AND METHOD FOR DYNAMICALLY MANAGING MEMORY ACCESS BANDWIDTH IN MULTI-CORE PROCESSOR - An apparatus and method are described for performing history-based prefetching. For example a method according to one embodiment comprises: determining if a previous access signature exists in memory for a memory page associated with a current stream; if the previous access signature exists, reading the previous access signature from memory; and issuing prefetch operations using the previous access signature. | 10-03-2013 |
20130290678 | INSTRUCTION AND LOGIC TO LENGTH DECODE X86 INSTRUCTIONS - Techniques to increase the consumption rate of raw instruction bytes within an instruction fetch unit. An instruction fetch unit according to embodiments of the present invention may include a prefetch buffer, a set of bypass multiplexers, an array of bypass latches, a byte-block multiplexer, an instruction alignment multiplexer, a predecode cache, and an instruction length decoder. Raw instruction bytes may be steered from the bypass latches into macro-instructions for consumption by the instruction length decoder, which may generate micro-instructions from the macro-instructions. Embodiments of the present invention may de-couple a latency for reading raw instruction bytes from the prefetch buffer from consuming raw instruction bytes by the instruction length decoder. | 10-31-2013 |
20130326193 | PROCESSOR RESOURCE AND EXECUTION PROTECTION METHODS AND APPARATUS - Embodiments include processing systems that determine, based on an instruction address range indicator stored in a first register, whether a next instruction fetch address corresponds to a location within a first memory region associated with a current privilege state or within a second memory region associated with a different privilege state. When the next instruction fetch address is not within the first memory region, the next instruction is allowed to be fetched only when a transition to the different privilege state is legal. In a further embodiment, when a data access address is generated for an instruction, a determination is made, based on a data address range indicator stored in a second register, whether access to a memory location corresponding to the data access address is allowed. The access is allowed when the current privilege state is a privilege state in which access to the memory location is allowed. | 12-05-2013 |
20130339665 | COLLISION-BASED ALTERNATE HASHING - Embodiments relate to collision-based alternate hashing. An aspect includes receiving an incoming instruction address. Another aspect includes determining whether an entry for the incoming instruction address exists in a history table based on a hash of the incoming instruction address. Another aspect includes based on determining that the entry for the incoming instruction address exists in the history table, determining whether the incoming instruction address matches an address tag in the determined entry. Another aspect includes based on determining that the incoming instruction address does not match the address tag in the determined entry, determining whether a collision exists for the incoming instruction address. Another aspect includes based on determining that the collision exists for the incoming instruction address, activating alternate hashing for the incoming instruction address using an alternate hash buffer. | 12-19-2013 |
20140019721 | MANAGED INSTRUCTION CACHE PREFETCHING - Disclosed is an apparatus and method to manage instruction cache prefetching from an instruction cache. A processor may comprise: a prefetch engine; a branch prediction engine to predict the outcome of a branch; and dynamic optimizer. The dynamic optimizer may be used to control: indentifying common instruction cache misses and inserting a prefetch instruction from the prefetch engine to the instruction cache. | 01-16-2014 |
20140019722 | PROCESSOR AND INSTRUCTION PROCESSING METHOD OF PROCESSOR - Provided are a processor and an instruction processing method of the processor, with which it is possible to increase an instruction execution rate. A processor 1 includes a BTAC 12 that stores branch target information of a branch instruction and boundary information indicating that the branch instruction is on a fetch line boundary, a branch prediction unit 13 that performs branch prediction of a variable-length instruction set including the branch instruction by referring to the BTAC 12, and a fetch unit 14 that fetches an instruction based on the branch prediction result. The branch prediction unit 13 refers to the BTAC 12, and when the boundary information is present in the instruction which the branch prediction unit 13 makes the fetch unit 14 fetch, the branch prediction unit 13 makes the fetch unit 14 fetch the following next fetch line as well and then makes the fetch unit 14 fetch a branch prediction target instruction according to the branch target information. | 01-16-2014 |
20140025931 | METHOD AND APPARATUS FOR CONTROLLING FETCH-AHEAD IN A VLES PROCESSOR ARCHITECTURE - There is provided a method for controlling fetch-ahead of Fetch Sets into a decoupling First In First Out (FIFO) buffer of a Variable Length Execution Set (VLES) processor architecture, wherein a Fetch Set comprises at least a portion of a VLES group available for dispatch to processing resources within the VLES processor architecture, comprising, for each cycle, determining a number of VLES groups available for dispatch from previously pre-fetched Fetch Sets, and only requesting a fetch-ahead of a next Fetch Set in the next cycle if one of a select set of criteria related to the number of VLES groups available for dispatch is true. | 01-23-2014 |
20140025932 | PROCESSOR, INFORMATION PROCESSING DEVICE, AND CONTROL METHOD OF PROCESSOR - A processor includes: a first GHR that indicates, in time series, results which have predicted validity or invalidity of branches when instructions have been fetched; a second GHR that indicates, in time series, results which have decided validity or invalidity of branches when computation has been completed; a branch prediction unit that, when the instructions are fetched, executes branch prediction by using a branch validity accuracy which are decided based on not only a branch history (BRHIS) but also the instruction fetch address and the first GHR and indicates whether the instruction is a branch direction as expected; an update unit that updates the first GHR with the value of the second GHR when it is decided that the branch prediction has failed based on the result of the branch computation; wherein an execution unit re-executes the instruction fetch. | 01-23-2014 |
20140032881 | INSTRUCTION AND LOGIC FOR PERFORMING A DOT-PRODUCT OPERATION - Method, apparatus, and program means for performing a dot-product operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources store to a storage location a result value equal to a dot-product of at least two operands. | 01-30-2014 |
20140101413 | INFORMATION HANDLING SYSTEM INCLUDING HARDWARE AND SOFTWARE PREFETCH - A prefetch optimizer tool for an information handling system (IHS) may improve effective memory access time by controlling both hardware prefetch operations and software prefetch operations. The prefetch optimizer tool selectively disables prefetch instructions in an instruction sequence of interest within an application. The tool measures execution times of the instruction sequence of interest when different prefetch instructions are disabled. The tool may hold hardware prefetch depth constant while cycling through disabling different prefetch instructions and taking corresponding execution time measurements. Alternatively, for each disabled prefetch instruction in the instruction sequence of interest, the tool may cycle through different hardware prefetch depths and take corresponding execution time measurements at each hardware prefetch depth. The tool selects a combination of hardware prefetch depth and prefetch instruction disablement that may improve the execution time in comparison with a baseline execution time. | 04-10-2014 |
20140143522 | PREFETCHING BASED UPON RETURN ADDRESSES - An apparatus for processing data includes signature generation circuitry | 05-22-2014 |
20140173254 | CACHE PREFETCH FOR DETERMINISTIC FINITE AUTOMATON INSTRUCTIONS - In a DFA scanning engine used to match regular expressions or similar rules, instructions to execute DFA state transitions are accessed through an instruction cache. Each DFA instruction may indicate varying numbers of transitions or branches from a current state. The cache pre-fetches a requested number of additional instructions consecutively following an accessed instruction. The DFA engine accesses an instruction from the cache corresponding to a state within a small number of transitions from the root state. When a low-branching instruction is executed to access a next instruction from the root state, or when a low-branching instruction is executed to access a next instruction from the cache, a fixed or configurable pre-fetch length is requested. Some instructions such as low-branching instructions may contain a pre-fetch hint. | 06-19-2014 |
20140181475 | Parallel Processing of a Sequential Program Using Hardware Generated Threads and Their Instruction Groups Executing on Plural Execution Units and Accessing Register File Segments Using Dependency Inheritance Vectors Across Multiple Engines - A unified architecture for dynamic generation, execution, synchronization and parallelization of complex instruction formats includes a virtual register file, register cache and register file hierarchy. A self-generating and synchronizing dynamic and static threading architecture provides efficient context switching. | 06-26-2014 |
20140208075 | SYSTEMS AND METHOD FOR UNBLOCKING A PIPELINE WITH SPONTANEOUS LOAD DEFERRAL AND CONVERSION TO PREFETCH - Apparatuses, systems, and a method for providing a processor architecture with a control speculative load are described. In one embodiment, a computer-implemented method includes determining whether a speculative load instruction encounters a long latency condition, spontaneously deferring the speculative load instruction if the speculative load instruction encounters the long latency condition, and initiating a prefetch of a translation or of data that requires long latency access when the speculative load instruction encounters the long latency condition. The method further includes reaching a check instruction, which resteers to recovery code that executes a non-speculative version of the load. | 07-24-2014 |
20140223141 | SHARING TLB MAPPINGS BETWEEN CONTEXTS - In some implementations, a processor may include a data structure, such as a translation lookaside buffer, that includes an entry containing first mapping information having a virtual address and a first context associated with a first thread. Control logic may receive a request for second mapping information having the virtual address and a second context associated with a second thread. The control logic may determine whether the second mapping information associated with the second context is equivalent to the first mapping information in the entry of the data structure. If the second mapping information is equivalent to the first mapping information, the control logic may associate the second thread with the first mapping information contained in the entry of the data structure to share the entry between the first thread and the second thread. | 08-07-2014 |
20140258681 | ANTICIPATED PREFETCHING FOR A PARENT CORE IN A MULTI-CORE CHIP - Embodiments relate to prefetching data on a chip having a scout core and a parent core coupled to the scout core. The method includes determining that a program executed by the parent core requires content stored in a location remote from the parent core. The method includes sending a fetch table address determined by the parent core to the scout core. The method includes accessing a fetch table that is indicated by the fetch table address by the scout core. The fetch table indicates how many of pieces of content are to be fetched by the scout core and a location of the pieces of content. The method includes based on the fetch table indicating, fetching the pieces of content by the scout core. The method includes returning the fetched pieces of content to the parent core. | 09-11-2014 |
20140258682 | PIPELINED PROCESSOR - Provided is a processor with a multi-pipeline fetch structure or a multi-cycle cache structure, including: an integer core which reads instruction transmitted from a lower block, executes an operation corresponding to the instruction, and transmits an instruction address to the lower block; an instruction buffer which stores instruction data which are requested by the integer core by using the instruction address and transmits the instruction data in response to the request of the integer core; and an instruction cache which stores a portion of data of a program memory and transmit the data to the instruction buffer in response to the request of the instruction buffer. | 09-11-2014 |
20140281390 | SYSTEM AND METHOD FOR ORDERING PACKET TRANSFERS IN A DATA PROCESSOR - A data processor includes a packet selector. The packet selector creates an ordered list of packets, each packet corresponding to a respective communication flow, determines whether each packet in the ordered list of packets is eligible for transfer to a prefetch unit based on whether a preceding packet in the same communication flow has been transferred to the prefetch unit, and sets a selection priority for each packet based on start time constraints for the respective communication flow, and based on a processing status of a preceding packet in the communication flow. | 09-18-2014 |
20140344551 | DUAL-MODE INSTRUCTION FETCHING APPARATUS AND METHOD - The dual-mode instruction fetching apparatus includes a mode register, a branch prediction unit, a Program Counter (PC) calculator, an Instruction Queue (IQ), and a fetch multiplexer. The mode register is set to one of normal mode and line mode. The PC calculator accesses a tag in which the address indices of instructions have been stored or a line in which the instructions have been grouped and then outputs an instruction, or accesses only the line and then outputs an instruction depending on the type of set mode. The IQ stores instructions selected by an instruction selector from among the instructions grouped in the line. The fetch multiplexer fetches the instructions stored in the IQ if the normal mode has been set, and fetches instructions read from the line of an instruction cache if the line mode has been set. | 11-20-2014 |
20140372730 | SOFTWARE CONTROLLED DATA PREFETCH BUFFERING - The invention relates to the method of prefetching data in micro-processor buffer under software controls. | 12-18-2014 |
20140372731 | DATA PROCESSING SYSTEMS - A data processing system includes an execution pipeline that includes one or more programmable execution stages which execute execution threads to execute instructions to perform data processing operations. Instructions to be executed by a group of execution threads are first fetched into an instruction cache and then read from the instruction cache for execution by the thread group. When an instruction to be executed by a thread group is present in a cache line in the instruction cache, or is to be fetched into an allocated cache line in the instruction cache, a pointer to the location of the instruction in the instruction cache is stored for the thread group. This stored pointer is then used to retrieve the instruction for execution by the thread group from the instruction cache. | 12-18-2014 |
20150019841 | ANTICIPATED PREFETCHING FOR A PARENT CORE IN A MULTI-CORE CHIP - Embodiments relate to prefetching data on a chip having a scout core and a parent core coupled to the scout core. A method includes determining that a program executed by the parent core requires content stored in a location remote from the parent core. The method includes sending a fetch table address determined by the parent core to the scout core. The method includes accessing a fetch table that is indicated by the fetch table address by the scout core. The fetch table indicates how many of pieces of content are to be fetched by the scout core and a location of the pieces of content. The method includes based on the fetch table indicating, fetching the pieces of content by the scout core. The method includes returning the fetched pieces of content to the parent core. | 01-15-2015 |
20150058600 | DATA PROCESSOR - A data processor of an embodiment includes a memory, an instruction cache, a processing unit (CPU), and a fetch process control unit. The memory stores a program in which a plurality of instructions are written. The instruction cache operates only when a branch instruction included in the program is executed, and data of a greater capacity than a width of a bus of the memory is read from the memory and stored in the instruction cache in advance. The processing unit accesses both the memory and the instruction cache and executes, in a pipelined manner, instructions read from the memory or the instruction cache. The fetch process control unit generates, in response to a branch instruction executed by the processing unit, a stop signal for stopping a fetch process of reading an instruction from the memory, and outputs the stop signal to the memory. | 02-26-2015 |
20150089195 | METHOD AND APPARATUS FOR PERFORMING A SHIFT AND EXCLUSIVE OR OPERATION IN A SINGLE INSTRUCTION - Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value. | 03-26-2015 |
20150089196 | METHOD AND APPARATUS FOR PERFORMING A SHIFT AND EXCLUSIVE OR OPERATION IN A SINGLE INSTRUCTION - Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value. | 03-26-2015 |
20150089197 | METHOD AND APPARATUS FOR PERFORMING A SHIFT AND EXCLUSIVE OR OPERATION IN A SINGLE INSTRUCTION - Method and apparatus for performing a shift and XOR operation. In one embodiment, an apparatus includes execution resources to execute a first instruction. In response to the first instruction, said execution resources perform a shift and XOR on at least one value. | 03-26-2015 |
20150095616 | DATA PROCESSOR - The present invention realizes an efficient superscalar instruction issue and low power consumption at an instruction set including instructions with prefixes. An instruction fetch unit is adopted which determines whether an instruction code is of a prefix code or an instruction code other than it, and outputs the result of determination and the 16-bit instruction code. Along with it, decoders each of which decodes the instruction code, based on the result of determination, and decoders each of which decodes the prefix code, are disposed separately. Further, a prefix is supplied to each decoder prior to a fixed-length instruction code like 16 bits modified with it. A fixed-length instruction code following the prefix code is supplied to each decoder of the same pipeline as the decoder for the prefix code. | 04-02-2015 |
20150100762 | INSTRUCTION CACHE WITH WAY PREDICTION - A processor includes an instruction fetch unit and an execution unit. The instruction fetch unit retrieves instructions from memory to be executed by the execution unit. The instruction fetch unit includes a branch prediction unit which is configured to predict whether a branch instruction is likely to be executed. The memory includes an instruction cache comprising a portion of the fetch blocks available in the memory. The instruction fetch unit may use a combination of way prediction and serialized access to retrieve instructions from the instruction cache. The instruction fetch unit initially accesses the instruction cache to retrieve the predicted fetch block associated with a way prediction. The instruction fetch unit compares a cache tag associated with the way prediction with the address of the cache line that includes the predicted fetch block. If the tag matches, then the way prediction is correct and the retrieved fetch block is valid. | 04-09-2015 |
20150106590 | FILTERING OUT REDUNDANT SOFTWARE PREFETCH INSTRUCTIONS - The disclosed embodiments relate to a system that selectively filters out redundant software prefetch instructions during execution of a program on a processor. During execution of the program, the system collects information associated with hit rates for individual software prefetch instructions as the individual software prefetch instructions are executed, wherein a software prefetch instruction is redundant if the software prefetch instruction accesses a cache line that has already been fetched from memory. As software prefetch instructions are encountered during execution of the program, the system selectively filters out individual software prefetch instructions that are likely to be redundant based on the collected information, so that likely redundant software prefetch instructions are not executed by the processor. | 04-16-2015 |
20150113249 | METHODS AND APPARATUS TO PERFORM ADAPTIVE PRE-FETCH OPERATIONS IN MANAGED RUNTIME ENVIRONMENTS - Methods and apparatus to perform adaptive pre-fetch operations in managed runtime environments are disclosed herein. An example disclosed method includes determining an object size associated with a pre-fetch operation; comparing the object size to a first one of a series of thresholds having increasing respective values; when the object size is less than the first one of the series of thresholds, pre-fetching a first amount of stored data assigned to the first one of the series of thresholds; and when the object size is greater than the first one of the plurality of thresholds, comparing the object size to a next one of the series of thresholds. | 04-23-2015 |
20150121038 | PREFETCH STRATEGY CONTROL - A single instruction multiple thread (SIMT) processor | 04-30-2015 |
20150121039 | METHOD AND APPARATUS FOR SHUFFLING DATA - Method, apparatus, and program means for shuffling data. The method of one embodiment comprises receiving a first operand having a set of L data elements and a second operand having a set of L control elements. For each control element, data from a first operand data element designated by the individual control element is shuffled to an associated resultant data element position if its flush to zero field is not set and a zero is placed into the associated resultant data element position if its flush to zero field is not set. | 04-30-2015 |
20150134933 | ADAPTIVE PREFETCHING IN A DATA PROCESSING APPARATUS - A data processing apparatus and method of data processing are disclosed. An instruction execution unit executes a sequence of program instructions, wherein execution of at least some of the program instructions initiates memory access requests to retrieve data values from a memory. A prefetch unit prefetches data values from the memory for storage in a cache unit before they are requested by the instruction execution unit. The prefetch unit is configured to perform a miss response comprising increasing a number of the future data values which it prefetches, when a memory access request specifies a pending data value which is already subject to prefetching but is not yet stored in the cache unit. The prefetch unit is also configured, in response to an inhibition condition being met, to temporarily inhibit the miss response for an inhibition period. | 05-14-2015 |
20150143084 | HAND HELD DEVICE TO PERFORM A BIT RANGE ISOLATION INSTRUCTION - Receiving an instruction indicating a source operand and a destination operand. Storing a result in the destination operand in response to the instruction. The result operand may have: (1) first range of bits having a first end explicitly specified by the instruction in which each bit is identical in value to a bit of the source operand in a corresponding position; and (2) second range of bits that all have a same value regardless of values of bits of the source operand in corresponding positions. Execution of instruction may complete without moving the first range of the result relative to the bits of identical value in the corresponding positions of the source operand, regardless of the location of the first range of bits in the result. Execution units to execute such instructions, computer systems having processors to execute such instructions, and machine-readable medium storing such an instruction are also disclosed. | 05-21-2015 |
20150301830 | PROCESSOR WITH VARIABLE PRE-FETCH THRESHOLD - A method and apparatus for controlling pre-fetching in a processor. A processor includes an execution pipeline and an instruction pre-fetch unit. The execution pipeline is configured to execute instructions. The instruction pre-fetch unit is coupled to the execution pipeline. The instruction pre-fetch unit includes instruction storage to store pre-fetched instructions, and pre-fetch control logic. The pre-fetch control logic is configured to fetch instructions from memory and store the fetched instructions in the instruction storage. The pre-fetch control logic is also configured to provide instructions stored in the instruction storage to the execution pipeline for execution. The pre-fetch control logic is further configured set a maximum number of instruction words to be pre-fetched for execution subsequent to execution of an instruction currently being executed in the instruction pipeline. The maximum number is based on a value contained in a pre-fetch threshold field of an instruction executed in the execution pipeline. | 10-22-2015 |
20150301832 | DYNAMICALLY ENABLED BRANCH PREDICTION - Embodiments for a processor that selectively enables and disables branch prediction are disclosed. The processor may include counters to track a number of fetched instructions, a number of branches, and a number of mispredicted branches. A misprediction threshold may be calculated dependent upon the tracked number of branches and a predefined misprediction ratio. Branch prediction may then be disabled when the number of mispredictions exceed the determined threshold value and dependent upon the branch rate. | 10-22-2015 |
20150347145 | ABSOLUTE ADDRESS BRANCHING IN A FIXED-WIDTH REDUCED INSTRUCTION SET COMPUTING ARCHITECTURE - Embodiments relate to a system for absolute address branching in a reduced instruction set computing (RISC) architecture. One aspect is a system that includes memory and a processing circuit communicatively coupled to the memory. The system is configured to perform a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A branch target address value is acquired from the instruction stream. The branch target address value represents a target address of the branch instruction. The branch target address value is formatted as an absolute address and sized as a multiple of the fixed instruction width. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter. | 12-03-2015 |
20150347146 | RELATIVE OFFSET BRANCHING IN A FIXED-WIDTH REDUCED INSTRUCTION SET COMPUTING ARCHITECTURE - Embodiments relate to a system for relative offset branching in a reduced instruction set computing (RISC) architecture. One aspect is a system that includes memory and a processing circuit communicatively coupled to the memory. The system is configured to perform a method that includes fetching a branch instruction from an instruction stream having a fixed instruction width. A relative offset value is acquired from the instruction stream. The relative offset value is formatted as an offset relative to a program counter value and sized as a multiple of the fixed instruction width. The relative offset value is added with the program counter value to form a branch target address value. The branch target address value is loaded into a program counter based on the branch instruction. Execution of the instruction stream is redirected to a next instruction based on the branch target address value in the program counter. | 12-03-2015 |
20160019065 | PREFETCHING INSTRUCTIONS IN A DATA PROCESSING APPARATUS - A data processing apparatus has prefetch circuitry for prefetching cache lines of instructions into an instruction cache. A prefetch lookup table is provided for storing prefetch entries, with each entry corresponding to a region of a memory address space and identifying at least one block of one or more cache lines within the corresponding region from which processing circuitry accessed an instruction on a previous occasion. When the processing circuitry executes an instruction from a new region, the prefetch circuitry looks up the table, and if it stores a prefetch entry for the new region, then the at least one block identified by the corresponding entry is prefetched into the cache. | 01-21-2016 |
20160110194 | DYNAMICALLY UPDATING HARDWARE PREFETCH TRAIT TO EXCLUSIVE OR SHARED AT PROGRAM DETECTION - A processor includes a processing core that detects a predetermined program is running on the processor and looks up a prefetch trait associated with the predetermined program running on the processor, wherein the prefetch trait is either exclusive or shared. The processor also includes a hardware data prefetcher that performs hardware prefetches for the predetermined program using the prefetch trait. Alternatively, the processing core loads each of one or more range registers of the processor with a respective address range in response to detecting that the predetermined program is running on the processor. Each of the one or more address ranges has an associated prefetch trait, wherein the prefetch trait is either exclusive or shared. The hardware data prefetcher performs hardware prefetches for the predetermined program using the prefetch traits associated with the address ranges loaded into the range registers. | 04-21-2016 |
20160147538 | PROCESSOR WITH MULTIPLE EXECUTION PIPELINES - An apparatus and method system and method for increasing performance in a processor or other instruction execution device while minimizing energy consumption. A processor includes a first execution pipeline and a second execution pipeline. The first execution pipeline includes a first decode unit and a first execution control unit coupled to the first decode unit. The first execution control unit is configured to control execution of all instructions executable by the processor. The second execution pipeline includes a second decode unit, and a second execution control unit coupled to the second decode unit. The second execution control unit is configured to control execution of a subset of the instructions executable via the first execution control unit. | 05-26-2016 |
20160179533 | SCALABLE EVENT HANDLING IN MULTI-THREADED PROCESSOR CORES | 06-23-2016 |
20160179544 | INSTRUCTION AND LOGIC FOR SUPPRESSION OF HARDWARE PREFETCHERS | 06-23-2016 |
20160188337 | HARDWARE APPARATUSES AND METHODS TO PREFETCH A MULTIDIMENSIONAL BLOCK OF ELEMENTS FROM A MULTIMENSIONAL ARRAY - Methods and apparatuses relating to a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache. In one embodiment, a hardware processor includes a decoder to decode a prefetch instruction to prefetch a multidimensional block of elements from a multidimensional array into a cache, wherein at least one operand of the prefetch instruction is to indicate a system memory address of an element of the multidimensional block of elements, a stride of the multidimensional block of elements, and boundaries of the multidimensional block of elements, and an execution unit to execute the prefetch instruction to generate system memory addresses of the other elements of the multidimensional block of elements, and load the multidimensional block of elements into the cache from the system memory addresses. | 06-30-2016 |
20180024836 | DETERMINING THE EFFECTIVENESS OF PREFETCH INSTRUCTIONS | 01-25-2018 |