Patent application number | Description | Published |
20080256336 | MICROPROCESSOR WITH PRIVATE MICROCODE RAM - A microprocessor includes a private RAM (PRAM), for use by microcode, which is non-user-accessible and within its own distinct address space from the system memory address space. The PRAM is denser and slower than user-accessible registers of the microprocessor macroarchitecture, thereby enabling it to provide significantly more storage for microcode. The microinstruction set includes a microinstruction for loading data from the PRAM into the user-accessible registers, and a microinstruction for storing data from user-accessible registers to the PRAM. The microcode may also use the two microinstructions to load/store between the PRAM and non-user-accessible registers of the microarchitecture. Examples of PRAM uses include: computational temporary storage area; storage of x86 VMX VMCS in response to VMREAD and VMWRITE macroinstructions; instantiation of non-user-accessible storage, such as the x86 SMBASE register; and instantiation of x86 MSRs that tolerate the additional access latency of the PRAM, such as the IA32_SYSENTER_CS MSR. | 10-16-2008 |
20090204800 | MICROPROCESSOR WITH MICROARCHITECTURE FOR EFFICIENTLY EXECUTING READ/MODIFY/WRITE MEMORY OPERAND INSTRUCTIONS - The microprocessor includes an instruction translator that translates a macroinstruction of a macroinstruction set in its macroarchitecture into exactly three microinstructions to perform a read/modify/write operation on a memory operand. The first microinstruction instructs the microprocessor to load the memory operand into the microprocessor from a memory location and to calculate a destination address of the memory location. The second microinstruction instructs the microprocessor to perform an arithmetic or logical operation on the loaded memory operand to generate a result. The third microinstruction instructs the microprocessor to write the result to the memory location whose destination address is calculated by the first microinstruction. A first execution unit receives the first microinstruction and responsively loads the memory operand into the microprocessor from the memory location, and a second distinct execution unit also receives the first microinstruction and responsively calculates the destination address of the memory location. | 08-13-2009 |
20100011188 | MICROPROCESSOR THAT PERFORMS SPECULATIVE TABLEWALKS - A microprocessor performs a speculative page tablewalk. The microprocessor includes a tablewalk engine that determines whether at least one of a predetermined set of conditions exists with respect to characteristics of the page of memory whose physical address specified by a memory access instruction is missing in the TLB, performs operations of the tablewalk in an out-of-order manner with respect to the execution of unretired program instructions older than the memory access instruction while none of the predetermined set of conditions exists, and waits to perform the operations of the tablewalk until the microprocessor has retired all program instructions older than the memory access instruction when at least one of the predetermined set of conditions exists. The predetermined set of conditions may include the tablewalk needing to load information from a strongly-ordered page, update page mapping information, or access a global page. | 01-14-2010 |
20100011198 | MICROPROCESSOR WITH MULTIPLE OPERATING MODES DYNAMICALLY CONFIGURABLE BY A DEVICE DRIVER BASED ON CURRENTLY RUNNING APPLICATIONS - A computing system includes a microprocessor that receives values for configuring operating modes thereof. A device driver monitors which software applications currently running on the microprocessor are in a predetermined list and responsively dynamically writes the values to the microprocessor to configure its operating modes. Examples of the operating modes the device driver may configure relate to the following: data prefetching; branch prediction; instruction cache eviction; instruction execution suspension; sizes of cache memories, reorder buffer, store/load/fill queues; hashing algorithms related to data forwarding and branch target address cache indexing; number of instruction translation, formatting, and issuing per clock cycle; load delay mechanism; speculative page tablewalks; instruction merging; out-of-order execution extent; caching of non-temporal hinted data; and serial or parallel access of an L2 cache and processor bus in response to an instruction cache miss. | 01-14-2010 |
20100049952 | MICROPROCESSOR THAT PERFORMS STORE FORWARDING BASED ON COMPARISON OF HASHED ADDRESS BITS - An apparatus for decreasing the likelihood of incorrectly forwarding store data includes a hash generator, which hashes J address bits to K hashed bits. The J address bits are a memory address specified by a load/store instruction, where K is an integer greater than zero and J is an integer greater than K. The apparatus also includes a comparator, which outputs a first value if L address bits specified by the load instruction match L address bits specified by the store instruction and K hashed bits of the load instruction match corresponding K hashed bits of the store instruction, and otherwise to output a second value, where L is greater than zero. The apparatus also includes forwarding logic, which forwards data from the store instruction to the load instruction if the comparator outputs the first value and foregoes forwarding the data when the comparator outputs the second value. | 02-25-2010 |
20100064107 | MICROPROCESSOR CACHE LINE EVICT ARRAY - An apparatus for ensuring data coherency within a cache memory hierarchy of a microprocessor during an eviction of a cache line from a lower-level memory to a higher-level memory in the hierarchy includes an eviction engine and an array of storage elements. The eviction engine is configured to move the cache line from the lower-level memory to the higher-level memory. The array of storage elements are coupled to the eviction engine. Each storage element is configured to store an indication for a corresponding cache line stored in the lower-level memory. The indication indicates whether or not the eviction engine is currently moving the cache line from the lower-level memory to the higher-level memory. | 03-11-2010 |
20100070741 | MICROPROCESSOR WITH FUSED STORE ADDRESS/STORE DATA MICROINSTRUCTION - A microprocessor includes an instruction translator that translates a store macroinstruction into exactly one fused store microinstruction. The store macroinstruction in the microprocessor's macroarchitecture macroinstruction set instructs the microprocessor to store data from a general purpose register of the microprocessor to a memory location. The fused store microinstruction is an instruction in the microprocessor's microarchitecture microinstruction set. A reorder buffer (ROB) receives the fused store microinstruction from the instruction translator into exactly one of its plurality of entries. An instruction dispatcher dispatches for execution a store address microinstruction and a store data microinstruction to different respective execution units of the microprocessor in response to receiving the fused store microinstruction. Neither the store address microinstruction nor the store data microinstruction occupy any of the ROB entries. The ROB retires the fused store microinstruction after being notified that both the store address microinstruction and the store data microinstruction have been executed. | 03-18-2010 |
20100205406 | OUT-OF-ORDER EXECUTION MICROPROCESSOR THAT SPECULATIVELY EXECUTES DEPENDENT MEMORY ACCESS INSTRUCTIONS BY PREDICTING NO VALUE CHANGE BY OLDER INSTRUCTIONS THAT LOAD A SEGMENT REGISTER - An out-of-order execution microprocessor executes an architectural segment register-loading instruction that instructs the microprocessor to load a new value into an architectural segment register of the microprocessor. A comparator compares the new value specified by the architectural segment register-loading instruction with a current contents of the architectural segment register. A control unit causes to be re-executed using the new value all instructions in the microprocessor that used the current architectural segment register contents as a source operand and that are newer in program order than the architectural segment register-loading instruction whenever the comparator indicates the new value does not equal the current contents. An instruction scheduler retrieves the current contents and issues for execution instructions that use the retrieved current contents, even though the instructions are newer in program order than the register-loading instruction and the register-loading instruction has not yet written the new value to the architectural segment register. | 08-12-2010 |
20100250859 | PREFETCHING OF NEXT PHYSICALLY SEQUENTIAL CACHE LINE AFTER CACHE LINE THAT INCLUDES LOADED PAGE TABLE ENTRY - A microprocessor includes a cache memory, a load unit, and a prefetch unit, coupled to the load unit. The load unit is configured to receive a load request that includes an indicator that the load request is loading a page table entry. The prefetch unit is configured to receive from the load unit a physical address of a first cache line that includes the page table entry specified by the load request. The prefetch unit is further configured to responsively generate a request to prefetch into the cache memory a second cache line. The second cache line is the next physically sequential cache line to the first cache line. In an alternate embodiment, the second cache line is the previous physically sequential cache line to the first cache line rather than the next physically sequential cache line to the first cache line. | 09-30-2010 |
20100299484 | LOW POWER HIGH SPEED LOAD-STORE COLLISION DETECTOR - An apparatus detects a load-store collision within a microprocessor between a load operation and an older store operation each of which accesses data in the same cache line. Load and store byte masks specify which bytes contain the data specified by the load and store operation within a word of the cache line in which the load and data begins, respectively. Load and store word masks specify which words contain the data specified by the load and store operations within the cache line, respectively. Combinatorial logic uses the load and store byte masks to detect the load-store collision if the data specified by the load and store operations begin in the same cache line word, and uses the load and store word masks to detect the load-store collision if the data specified by the load and store operations do not begin in the same cache line word. | 11-25-2010 |
20100306475 | DATA CACHE WITH MODIFIED BIT ARRAY - A cache memory system includes a first array of storage elements each configured to store a cache line, a second array of storage elements corresponding to the first array of storage elements each configured to store a first partial status of the cache line in the corresponding storage element of the first array, and a third array of storage elements corresponding to the first array of storage elements each configured to store a second partial status of the cache line in the corresponding storage element of the first array. The second partial status indicates whether or not the cache line has been modified. When the cache memory system modifies the cache line within a storage element of the first array, it writes only the second partial status in the corresponding storage element of the third array to indicate that the cache line has been modified but refrains from writing the first partial status in the corresponding storage element of the second array. The cache memory system reads both the first partial status and the second partial status to determine the full status. | 12-02-2010 |
20100306478 | DATA CACHE WITH MODIFIED BIT ARRAY - A microprocessor includes first and second functional units and a data cache having a data array having a write port, a modified bit array having a read port and a write port, and a tag array having a read port, each array having the corresponding predetermined organization. The first functional unit writes data to a cache line of the data array. The first functional unit sets a modified bit in the modified bit array to indicate that the corresponding cache line in the data array has been modified. The second functional unit reads the modified bit from the modified bit array to determine whether or not the cache line has been modified. The second functional unit reads a partial status of the corresponding cache line from the tag array. The partial status does not indicate whether the cache line has been modified. The tag array does not include a port by which the first functional unit may update the partial status of the corresponding cache line. | 12-02-2010 |
20100306503 | GUARANTEED PREFETCH INSTRUCTION - A microprocessor includes a cache memory, an instruction set having first and second prefetch instructions each configured to instruct the microprocessor to prefetch a cache line of data from a system memory into the cache memory, and a memory subsystem configured to execute the first and second prefetch instructions. For the first prefetch instruction the memory subsystem is configured to forego prefetching the cache line of data from the system memory into the cache memory in response to a predetermined set of conditions. For the second prefetch instruction the memory subsystem is configured to complete prefetching the cache line of data from the system memory into the cache memory in response to the predetermined set of conditions. | 12-02-2010 |
20100306506 | MICROPROCESSOR WITH SELECTIVE OUT-OF-ORDER BRANCH EXECUTION - A pipelined out-of-order execution in-order retire microprocessor includes a branch predictor that predicts a target address of a branch instruction, a fetch unit that fetches instructions at the predicted target address, and an execution unit that: resolves a target address of the branch instruction and detects that the predicted and resolved target addresses are different; determines whether there is an unretired instruction that must be corrected and that is older in program order than the branch instruction, in response to detecting that the predicted and resolved target addresses are different; execute the branch instruction by flushing instructions fetched at the predicted target address and causing the fetch unit to fetch from the resolved target address, if there is not an unretired instruction that must be corrected and that is older in program order than the branch instruction; and otherwise, refrain from executing the branch instruction. | 12-02-2010 |
20100306507 | OUT-OF-ORDER EXECUTION MICROPROCESSOR WITH REDUCED STORE COLLISION LOAD REPLAY REDUCTION - An out-of-order execution microprocessor for reducing the likelihood of having to replay a load instruction due to a store collision. The microprocessor includes a queue of entries, each entry configured to hold information that identifies sources of a store instruction used to compute its store address and to hold a dependency that identifies an instruction upon which the store instruction depends for its data. A register alias table (RAT), coupled to the queue of entries, is configured to encounter instructions in program order and to generate dependencies used to determine when the instructions may execute out of program order. In response to encountering a load instruction the RAT determines whether sources of the load instruction used to compute its load address match the sources of the store instruction in an entry of the queue, and if so, causes the load instruction to share the dependency of the matching store instruction. | 12-02-2010 |
20100306508 | OUT-OF-ORDER EXECUTION MICROPROCESSOR WITH REDUCED STORE COLLISION LOAD REPLAY REDUCTION - An out-of-order execution microprocessor for reducing load instruction replay likelihood due to store collisions. A register alias table (RAT) is coupled to first and second queues of entries and generates dependencies used to determine when instructions may execute out of order. The RAT allocates an entry of the first queue and populates the allocated entry with an instruction pointer of a load instruction, when it determines that the load instruction must be replayed. The RAT allocates an entry of the second queue when it encounters a store instruction and populates the allocated entry with a dependency that identifies an instruction upon which the store instruction depends for its data. The RAT causes a subsequent instance of the load instruction to share the dependency when it encounters the subsequent instance of the load instruction and determines that its instruction pointer matches the instruction pointer of an entry of the first queue. | 12-02-2010 |
20100306509 | OUT-OF-ORDER EXECUTION MICROPROCESSOR WITH REDUCED STORE COLLISION LOAD REPLAY REDUCTION - An out-of-order execution microprocessor for reducing the likelihood of having to replay a load instruction due to a store collision. The microprocessor includes a queue of entries, each entry configured to hold an instruction pointer of a load instruction and to hold information useable to identify a store instruction that caused the load instruction to be replayed on a first instance of the load instruction. A register alias table (RAT) encounters instructions in program order and generates dependencies used to determine when the instructions may execute out of program order. The RAT encounters the load instruction on a second instance, determines that the load instruction second instance instruction pointer matches the instruction pointer of an entry of the queue, and causes the load instruction on the second instance to have a dependency on the store instruction identified by the information in the matching entry. | 12-02-2010 |
20110004644 | DYNAMIC FLOATING POINT REGISTER PRECISION CONTROL - Apparatus and methods are provided to perform floating point operations that are adaptive to the precision formats of input operands. The apparatus includes adaptive conversion logic and a tagged register file. The adaptive conversion logic receives the input operands, where each of the input operands is of a corresponding precision. The adaptive conversion logic also records the corresponding precision for use in subsequent floating point operations. The tagged register file is coupled to the adaptive conversion logic. The tagged register file stores the each of the input operands, and stores the corresponding precision and furthermore associates the corresponding precision with the each of the input operands. The subsequent floating point operations are performed at a precision level according to the corresponding precision. | 01-06-2011 |
20110010501 | EFFICIENT DATA PREFETCHING IN THE PRESENCE OF LOAD HITS - A BIU prioritizes L1 requests above L2 requests. The L2 generates a first request to the BIU and detects the generation of a snoop request and L1 request to the same cache line. The L2 determines whether a bus transaction to fulfill the first request may be retried and, if so, generates a miss, and otherwise generates a hit. Alternatively, the L2 detects the L1 generated a request to the L2 for the same line and responsively requests the BIU to refrain from performing a transaction on the bus to fulfill the first request if the BIU has not yet been granted the bus. Alternatively, a prefetch cache and the L2 allow the same line to be simultaneously present. If an L1 request hits in both the L2 and in the prefetch cache, the prefetch cache invalidates its copy of the line and the L2 provides the line to the L1. | 01-13-2011 |
20110010506 | DATA PREFETCHER WITH MULTI-LEVEL TABLE FOR PREDICTING STRIDE PATTERNS - A data prefetcher includes a table of entries to maintain a history of load operations. Each entry stores a tag and a corresponding next stride. The tag comprises a concatenation of first and second strides. The next stride comprises the first stride. The first stride comprises a first cache line address subtracted from a second cache line address. The second stride comprises the second cache line address subtracted from a third cache line address. The first, second and third cache line addresses each comprise a memory address of a cache line implicated by respective first, second and third temporally preceding load operations. Control logic calculates a current stride by subtracting a previous cache line address from a new load cache line address, looks up in the table a concatenation of a previous stride and the current stride, and prefetches a cache line using the hitting table entry next stride. | 01-13-2011 |
20110035551 | MICROPROCESSOR WITH REPEAT PREFETCH INDIRECT INSTRUCTION - A microprocessor includes an instruction decoder for decoding a repeat prefetch indirect instruction that includes address operands used to calculate an address of a first entry in a prefetch table having a plurality of entries, each including a prefetch address. The repeat prefetch indirect instruction also includes a count specifying a number of cache lines to be prefetched. The memory address of each of the cache lines is specified by the prefetch address in one of the entries in the prefetch table. A count register, initially loaded with the count specified in the prefetch instruction, stores a remaining count of the cache lines to be prefetched. Control logic fetches the prefetch addresses of the cache lines from the table into the microprocessor and prefetches the cache lines from the system memory into a cache memory of the microprocessor using the count register and the prefetch addresses fetched from the table. | 02-10-2011 |
20110035569 | MICROPROCESSOR WITH ALU INTEGRATED INTO LOAD UNIT - A superscalar pipelined microprocessor includes a register set defined by its instruction set architecture, a cache memory, execution units, and a load unit, coupled to the cache memory and distinct from the other execution units. The load unit comprises an ALU. The load unit receives an instruction that specifies a memory address of a source operand, an operation to be performed on the source operand to generate a result, and a destination register of the register set to which the result is to be stored. The load unit reads the source operand from the cache memory. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The load unit outputs the result for subsequent retirement to the destination register. | 02-10-2011 |
20110035570 | MICROPROCESSOR WITH ALU INTEGRATED INTO STORE UNIT - A superscalar pipelined microprocessor includes a register set defined by an instruction set architecture of the microprocessor, execution units, and a store unit, coupled to the cache memory and distinct from the other execution units of the microprocessor. The store unit comprises an ALU. The store unit receives an instruction that specifies a source register of the register set and an operation to be performed on a source operand to generate a result. The store unit reads the source operand from the source register. The ALU performs the operation on the source operand to generate the result, rather than forwarding the source operand to any of the other execution units of the microprocessor to perform the operation on the source operand to generate the result. The store unit operatively writes the result to the cache memory. | 02-10-2011 |
20110040955 | STORE-TO-LOAD FORWARDING BASED ON LOAD/STORE ADDRESS COMPUTATION SOURCE INFORMATION COMPARISONS - A microprocessor includes a queue comprising a plurality of entries each configured to hold store information for a store instruction. The store information specifies sources of operands used to calculate a store address. The store instruction specifies store data to be stored to a memory location identified by the store address. The microprocessor also includes control logic, coupled to the queue, configured to encounter a load instruction. The load instruction includes load information that specifies sources of operands used to calculate a load address. The control logic detects that the load information matches the store information held in a valid one of the plurality of queue entries and responsively predicts that the microprocessor should forward to the load instruction the store data specified by the store instruction whose store information matches the load information. | 02-17-2011 |
20110055485 | EFFICIENT PSEUDO-LRU FOR COLLIDING ACCESSES - An apparatus for allocating entries in a set associative cache memory includes an array that provides a first pseudo-least-recently-used (PLRU) vector in response to a first allocation request from a first functional unit. The first PLRU vector specifies a first entry from a set of the cache memory specified by the first allocation request. The first vector is a tree of bits comprising a plurality of levels. Toggling logic receives the first vector and toggles predetermined bits thereof to generate a second PLRU vector in response to a second allocation request from a second functional unit generated concurrently with the first allocation request and specifying the same set of the cache memory specified by the first allocation request. The second vector specifies a second entry different from the first entry from the same set. The predetermined bits comprise bits of a predetermined one of the levels of the tree. | 03-03-2011 |
20110055530 | FAST REP STOS USING GRABLINE OPERATIONS - A microprocessor includes a cache memory and a grabline instruction. The grabline instruction specifies a memory address that implicates a cache line of the memory. The grabline instruction instructs the microprocessor to initiate a zero-beat read-invalidate transaction on the bus to obtain ownership of the cache line. The microprocessor foregoes initiating the transaction on the bus when executing the grabline instruction if the microprocessor determines that a store to the cache line would cause an exception. | 03-03-2011 |
20110060943 | APPARATUS AND METHOD FOR DETECTION AND CORRECTION OF DENORMAL SPECULATIVE FLOATING POINT OPERAND - A microprocessor includes a plurality of execution units configured to receive instructions and operands thereof and to execute the instructions. An instruction scheduler issues the instructions to the execution units and selects sources of the instruction operands. At least one of the execution units detects one of the operands of one of the instructions is a denormal operand, generates an indication that the instruction needs to be replayed in response to detecting the denormal operand, and provides the denormal operand to the instruction scheduler in response to detecting the denormal operand, rather than normalizing the denormal operand. The instruction scheduler normalizes the denormal operand, in response to the indication, and causes the normalized operand, rather than the denormal operand, to be provided to the execution unit when the instruction is replayed. | 03-10-2011 |
20110113196 | AVOIDING MEMORY ACCESS LATENCY BY RETURNING HIT-MODIFIED WHEN HOLDING NON-MODIFIED DATA - A microprocessor is configured to communicate with other agents on a system bus and includes a cache memory and a bus interface unit coupled to the cache memory and to the system bus. The bus interface unit receives from another agent coupled to the system bus a transaction to read data from a memory address, determines whether the cache memory is holding the data at the memory address in an exclusive state (or a shared state in certain configurations), and asserts a hit-modified signal on the system bus and provides the data on the system bus to the other agent when the cache memory is holding the data at the memory address in an exclusive state. Thus, the delay of an access to the system memory by the other agent is avoided. | 05-12-2011 |
20110185155 | MICROPROCESSOR THAT PERFORMS FAST REPEAT STRING LOADS - A microprocessor invokes microcode in response to encountering a repeat load string instruction. The microcode includes a series of guaranteed prefetch (GPREFETCH) instructions to fetch into a cache memory of the microprocessor a series of cache lines implicated by a string of data bytes specified by the instruction. A memory subsystem of the microprocessor guarantees within architectural limits that the cache line specified by each GPREFETCH instruction will be fetched into the cache. The memory subsystem completes each GPREFETCH instruction once it determines that no conditions exist that would prevent fetching the cache line specified by the GPREFETCH instruction and once it allocates a fill queue buffer to receive the cache line. A retire unit frees a reorder buffer entry allocated to each GPREFETCH instruction in response to completion of the GPREFETCH instruction regardless of whether the cache line specified by the GPREFETCH instruction has been fetched into the cache. | 07-28-2011 |
20110185160 | MULTI-CORE PROCESSOR WITH EXTERNAL INSTRUCTION EXECUTION RATE HEARTBEAT - A method for debugging a multi-core microprocessor includes causing the microprocessor to perform an actual execution of instructions and obtaining from the microprocessor heartbeat information that specifies an actual execution sequence of the instructions by the plurality of cores relative to one another, commanding a corresponding plurality of instances of a software functional model of the cores to execute the instructions according to the actual execution sequence specified by the heartbeat information to generate simulated results of the execution of the instructions, and comparing the simulated results with actual results of the execution of the instructions to determine whether they match. Each core outputs an instruction execution indicator indicating the number of instructions executed by the core each core clock. A heartbeat generator generates a heartbeat indicator for each core on an external bus that indicates the number of instructions executed by each core during each external bus clock cycle. | 07-28-2011 |
20110238920 | BOUNDING BOX PREFETCHER WITH REDUCED WARM-UP PENALTY ON MEMORY BLOCK CROSSINGS - A microprocessor includes a cache memory and a data prefetcher. The data prefetcher detects a pattern of memory accesses within a first memory block and prefetch into the cache memory cache lines from the first memory block based on the pattern. The data prefetcher also observes a new memory access request to a second memory block. The data prefetcher also determines that the first memory block is virtually adjacent to the second memory block and that the pattern, when continued from the first memory block to the second memory block, predicts an access to a cache line implicated by the new request within the second memory block. The data prefetcher also responsively prefetches into the cache memory cache lines from the second memory block based on the pattern. | 09-29-2011 |
20110238922 | BOUNDING BOX PREFETCHER - A data prefetcher in a microprocessor having a cache memory receives memory accesses each to an address within a memory block. The access addresses are non-monotonically increasing or decreasing as a function of time. As the accesses are received, the prefetcher maintains a largest address and a smallest address of the accesses and counts of changes to the largest and smallest addresses and maintains a history of recently accessed cache lines implicated by the access addresses within the memory block. The prefetcher also determines a predominant access direction based on the counts and determines a predominant access pattern based on the history. The prefetcher also prefetches into the cache memory, in the predominant access direction according to the predominant access pattern, cache lines of the memory block which the history indicates have not been recently accessed. | 09-29-2011 |
20110238923 | COMBINED L2 CACHE AND L1D CACHE PREFETCHER - A microprocessor includes a first-level cache memory, a second-level cache memory, and a data prefetcher that detects a predominant direction and pattern of recent memory accesses presented to the second-level cache memory and prefetches cache lines into the second-level cache memory based on the predominant direction and pattern. The data prefetcher also receives from the first-level cache memory an address of a memory access received by the first-level cache memory, wherein the address implicates a cache line. The data prefetcher also determines one or more cache lines indicated by the pattern beyond the implicated cache line in the predominant direction. The data prefetcher also causes the one or more cache lines to be prefetched into the first-level cache memory. | 09-29-2011 |
20110264860 | MULTI-MODAL DATA PREFETCHER - A microprocessor includes first and second cache memories occupying distinct hierarchy levels, the second backing the first. A prefetcher monitors load operations and maintains a recent history of the load operations from a cache line and determines whether the recent history indicates a clear direction. The prefetcher prefetches one or more cache lines into the first cache memory when the recent history indicates a clear direction and otherwise prefetches the one or more cache lines into the second cache memory. The prefetcher also determines whether the recent history indicates the load operations are large and, other things being equal, prefetches a greater number of cache lines when large than small. The prefetcher also determines whether the recent history indicates the load operations are received on consecutive clock cycles and, other things being equal, prefetches a greater number of cache lines when on consecutive clock cycles than not. | 10-27-2011 |
20120198176 | PREFETCHING OF NEXT PHYSICALLY SEQUENTIAL CACHE LINE AFTER CACHE LINE THAT INCLUDES LOADED PAGE TABLE ENTRY - A microprocessor includes a translation lookaside buffer, a request to load a page table entry into the microprocessor generated in response to a miss of a virtual address in the translation lookaside buffer, and a prefetch unit. The prefetch unit receives a physical address of a first cache line that includes the requested page table entry and responsively generates a request to prefetch into the microprocessor a second cache line that is the next physically sequential cache line to the first cache line. | 08-02-2012 |
20120260042 | LOAD MULTIPLE AND STORE MULTIPLE INSTRUCTIONS IN A MICROPROCESSOR THAT EMULATES BANKED REGISTERS - A microprocessor supports an instruction set architecture that specifies: processor modes, architectural registers associated with each mode, and a load multiple instruction that instructs the microprocessor to load data from memory into specified ones of the registers. Direct storage holds data associated with a first portion of the registers and is coupled to an execution unit to provide the data thereto. Indirect storage holds data associated with a second portion of the registers and cannot directly provide the data to the execution unit. Which architectural registers are in the first and second portions varies dynamically based upon the current processor mode. If a specified register is currently in the first portion, the microprocessor loads data from memory into the direct storage, whereas if in the second portion, the microprocessor loads data from memory into the direct storage and then stores the data from the direct storage to the indirect storage. | 10-11-2012 |
20120260064 | HETEROGENEOUS ISA MICROPROCESSOR WITH SHARED HARDWARE ISA REGISTERS - A microprocessor capable of running both x86 instruction set architecture (ISA) machine language programs and Advanced RISC Machines (ARM) ISA machine language programs includes a mode indicator that indicates whether the microprocessor is currently fetching instructions of an x86 ISA or ARM ISA machine language program and a plurality of hardware registers. When the mode indicator indicates the microprocessor is currently fetching x86 ISA machine language program instructions, the plurality of hardware registers store x86 ISA architectural state; when the mode indicator indicates the microprocessor is currently fetching ARM ISA machine language program instructions, the plurality of hardware registers store ARM ISA architectural state. | 10-11-2012 |
20120260065 | MULTI-CORE MICROPROCESSOR THAT PERFORMS X86 ISA AND ARM ISA MACHINE LANGUAGE PROGRAM INSTRUCTIONS BY HARDWARE TRANSLATION INTO MICROINSTRUCTIONS EXECUTED BY COMMON EXECUTION PIPELINE - A microprocessor includes a plurality of processing cores each including a hardware instruction translator that translates instructions of x86 instruction set architecture (ISA) machine language programs and Advanced RISC Machines (ARM) ISA machine language programs into microinstructions defined by a microinstruction set of the microprocessor. The microinstructions are encoded in a distinct manner from the manner in which the instructions of the x86 and ARM instruction sets are defined. Each core includes an execution pipeline that executes the microinstructions to generate results defined by the x86 ISA and ARM ISA instructions. Each core uses and associated indicator to determine whether it will boot as an x86 ISA core or an ARM ISA core when reset. The indicators are configurable to indicate that at least one of the cores will boot as an x86 ISA core and at least one other of the cores will boot as an ARM ISA core. | 10-11-2012 |
20120260066 | HETEROGENEOUS ISA MICROPROCESSOR THAT PRESERVES NON-ISA-SPECIFIC CONFIGURATION STATE WHEN RESET TO DIFFERENT ISA - A microprocessor capable of operating as both an x86 ISA and an ARM ISA microprocessor includes first, second, and third storage that stores x86 ISA-specific, ARM ISA-specific, and non-ISA-specific state, respectively. When reset, the microprocessor initializes the first storage to default values specified by the x86 ISA, initializes the second storage to default values specified by the ARM ISA, initializes the third storage to predetermined values, and begins fetching instructions of a first ISA. The first ISA is the x86 ISA or the ARM ISA and a second ISA is the other ISA. The microprocessor updates the third storage in response to the first ISA instructions. In response to a subsequent one of the first ISA instructions that instructs the microprocessor to reset to the second ISA, the microprocessor refrains from modifying the non-ISA-specific state stored in the third storage and begins fetching instructions of the second ISA. | 10-11-2012 |
20120260067 | MICROPROCESSOR THAT PERFORMS X86 ISA AND ARM ISA MACHINE LANGUAGE PROGRAM INSTRUCTIONS BY HARDWARE TRANSLATION INTO MICROINSTRUCTIONS EXECUTED BY COMMON EXECUTION PIPELINE - A microprocessor includes a hardware instruction translator that translates x86 ISA and ARM ISA machine language program instructions into microinstructions, which are encoded in a distinct manner from the x86 and ARM instructions. An execution pipeline executes the microinstructions to generate x86/ARM-defined results. The microinstructions are distinct from the results generated by the execution of the microinstructions by the execution pipeline. The translator directly provides the microinstructions to the execution pipeline for execution. Each time the microprocessor performs one of the x86 ISA and ARM ISA instructions, the translator translates it into the microinstructions. An indicator indicates either x86 or ARM as a boot ISA. After reset, the microprocessor initializes its architectural state, fetches its first instructions from a reset address, and translates them all as defined by the boot ISA. An instruction cache caches the x86 and ARM instructions and provides them to the translator. | 10-11-2012 |
20120260068 | APPARATUS AND METHOD FOR HANDLING OF MODIFIED IMMEDIATE CONSTANT DURING INSTRUCTION TRANSLATION - An ISA-defined instruction includes an immediate field having a first and second portions specifying first and second values, which instructs the microprocessor to perform an operation using a constant value as one of its source operands. The constant value is the first value rotated/shifted by a number of bits based on the second value. An instruction translator translates the instruction into one or more microinstructions. An execution pipeline executes the microinstructions generated by the instruction translator. The instruction translator, rather than the execution pipeline, generates the constant value for the execution pipeline as a source operand of at least one of the microinstructions for execution by the execution pipeline. Alternatively, if the immediate field value is not within a predetermined subset of values known by the instruction translator, the instruction translator generates, rather than the constant, a second microinstruction for execution by the execution pipeline to generate the constant. | 10-11-2012 |
20120260071 | CONDITIONAL ALU INSTRUCTION CONDITION SATISFACTION PROPAGATION BETWEEN MICROINSTRUCTIONS IN READ-PORT LIMITED REGISTER FILE MICROPROCESSOR - An architectural instruction instructs a microprocessor to perform an operation on first and second source operands to generate a result and to write the result to a destination register only if architectural condition flags satisfy a condition specified in the architectural instruction. A hardware instruction translator translates the architectural instruction into first and second microinstructions. To execute the first microinstruction, an execution pipeline performs the operation on the source operands to generate the result, determines whether the architectural condition flags satisfy the condition, and updates a non-architectural indicator to indicate whether the architectural condition flags satisfy the condition. To execute the first microinstruction, if the non-architectural indicator updated by the first microinstruction indicates the architectural condition flags satisfy the condition, it updates the destination register with the result; otherwise, it updates the destination register with the current value of the destination register. | 10-11-2012 |
20120260073 | EMULATION OF EXECUTION MODE BANKED REGISTERS - A microprocessor includes processor modes comprising a user mode and a plurality of exception modes. An execution unit performs arithmetic operations on operands specified by program instructions. A first set of storage elements holds a first subset of the operands and provides them to the execution unit coupled thereto. A second set of storage elements associated with each of the modes hold a second subset of the operands and are incapable of directly providing the second operand subset to the execution unit. To enter a new mode from a current mode, logic saves the first operand subset held in the first set of storage elements to the second set of storage elements associated with the current mode and restores to the first set of storage elements the second operand subset held in the second set of storage elements associated with the new mode. | 10-11-2012 |
20120260074 | EFFICIENT CONDITIONAL ALU INSTRUCTION IN READ-PORT LIMITED REGISTER FILE MICROPROCESSOR - A microprocessor having performs an architectural instruction that instructs it to perform an operation on first and second source operands to generate a result and to write the result to a destination register only if its architectural condition flags satisfy a condition specified in the architectural instruction. A hardware instruction translator translates the instruction into first and second microinstructions. To execute the first microinstruction, an execution pipeline performs the operation on the source operands to generate the result. To execute the second microinstruction, it writes the destination register with the result generated by the first microinstruction if the architectural condition flags satisfy the condition, and writes the destination register with the current value of the destination register if the architectural condition flags do not satisfy the condition. | 10-11-2012 |
20120260075 | CONDITIONAL ALU INSTRUCTION PRE-SHIFT-GENERATED CARRY FLAG PROPAGATION BETWEEN MICROINSTRUCTIONS IN READ-PORT LIMITED REGISTER FILE MICROPROCESSOR - A microprocessor includes a hardware instruction translator that translates an architectural instruction into first and second microinstructions. To execute the first microinstruction, an execution pipeline performs the shift operation on the first source operand to generate the first result and a carry flag value and updates a non-architectural carry flag with the generated carry flag value. To execute the second microinstruction, it performs the second operation on the first result and the second operand to generate the second result and new condition flag values based on the second result. If a architectural condition flags satisfy the condition, it updates the architectural carry flag with the non-architectural carry flag value and updates at least one of the other architectural condition flags with the corresponding generated new condition flag values; otherwise, it updates the architectural condition flags with the current value of the architectural condition flags. | 10-11-2012 |
20120272003 | EFFICIENT DATA PREFETCHING IN THE PRESENCE OF LOAD HITS - A microprocessor configured to access an external memory includes a first-level cache, a second-level cache, and a bus interface unit (BIU) configured to interface the first-level and second-level caches to a bus used to access the external memory. The BIU is configured to prioritize requests from the first-level cache above requests from the second-level cache. The second-level cache is configured to generate a first request to the BIU to fetch a cache line from the external memory. The second-level cache is also configured to detect that the first-level cache has subsequently generated a second request to the second-level cache for the same cache line. The second-level cache is also configured to request the BIU to refrain from performing a transaction on the bus to fulfill the first request if the BIU has not yet been granted ownership of the bus to fulfill the first request. | 10-25-2012 |
20120272004 | EFFICIENT DATA PREFETCHING IN THE PRESENCE OF LOAD HITS - A memory subsystem in a microprocessor includes a first-level cache, a second-level cache, and a prefetch cache configured to speculatively prefetch cache lines from a memory external to the microprocessor. The second-level cache and the prefetch cache are configured to allow the same cache line to be simultaneously present in both. If a request by the first-level cache for a cache line hits in both the second-level cache and in the prefetch cache, the prefetch cache invalidates its copy of the cache line and the second-level cache provides the cache line to the first-level cache. | 10-25-2012 |
20130067199 | CONTROL REGISTER MAPPING IN HETEROGENEOUS INSTRUCTION SET ARCHITECTURE PROCESSOR - A microprocessor capable of running both x86 instruction set architecture (ISA) machine language programs and Advanced RISC Machines (ARM) ISA machine language programs. The microprocessor includes a mode indicator that indicates whether the microprocessor is currently fetching instructions of an x86 ISA or ARM ISA machine language program. The microprocessor also includes a plurality of model-specific registers (MSRs) that control aspects of the operation of the microprocessor. When the mode indicator indicates the microprocessor is currently fetching x86 ISA machine language program instructions, each of the plurality of MSRs is accessible via an x86 ISA RDMSR/WRMSR instruction that specifies an address of the MSR. When the mode indicator indicates the microprocessor is currently fetching ARM ISA machine language program instructions, each of the plurality of MSRs is accessible via an ARM ISA MRRC/MCRR instruction that specifies the address of the MSR. | 03-14-2013 |
20130067202 | CONDITIONAL NON-BRANCH INSTRUCTION PREDICTION - A microprocessor processes conditional non-branch instructions that specify a condition and instruct the microprocessor to perform an operation if the condition is satisfied and otherwise to not perform the operation. A predictor provides a prediction about a conditional non-branch instruction. An instruction translator translates the conditional non-branch instruction into a no-operation microinstruction when the prediction predicts the condition will not be satisfied, and into a set of one or more microinstructions to unconditionally perform the operation when the prediction predicts the condition will be satisfied. An execution pipeline executes the no-operation microinstruction or the set of microinstructions. The predictor translates into a second set of one or more microinstructions to conditionally perform the operation when the prediction does not make a prediction. In the case of a misprediction, the translator re-translates the conditional non-branch instruction into the second set of microinstructions. | 03-14-2013 |
20130318530 | DEADLOCK/LIVELOCK RESOLUTION USING SERVICE PROCESSOR - A microprocessor includes a main processor and a service processor. The service processor is configured to detect and break a deadlock/livelock condition in the main processor. The service processor detects the deadlock/livelock condition by detecting the main processor has not retired an instruction or completed a processor bus transaction for a predetermined number of clock cycles. In response to detecting the deadlock/livelock condition in the main processor, the service processor causes arbitration requests to a cache memory to be captured in a buffer, analyzes the captured requests to detect a pattern that may indicate a bug causing the condition and performs actions associated with the pattern to break the deadlock/livelock. The actions include suppression of arbitration requests to the cache, suppression of comparisons cache request addresses and killing requests to access the cache. | 11-28-2013 |
20140013058 | PREFETCHING OF NEXT PHYSICALLY SEQUENTIAL CACHE LINE AFTER CACHE LINE THAT INCLUDES LOADED PAGE TABLE ENTRY - A microprocessor includes a translation lookaside buffer, a request to load a page table entry into the microprocessor generated in response to a miss of a virtual address in the translation lookaside buffer, and a prefetch unit. The prefetch unit receives a physical address of a first cache line that includes the requested page table entry and responsively generates a request to prefetch into the microprocessor a second cache line that is the next physically sequential cache line to the first cache line. | 01-09-2014 |
20140013089 | CONDITIONAL LOAD INSTRUCTIONS IN AN OUT-OF-ORDER EXECUTION MICROPROCESSOR - A microprocessor instruction translator translates a conditional load instruction into at least two microinstructions. An out-of-order execution pipeline executes the microinstructions. To execute a first microinstruction, an execution unit receives source operands from the source registers of a register file and responsively generates a first result using the source operands. To execute a second the microinstruction, an execution unit receives a previous value of the destination register and the first result and responsively reads data from a memory location specified by the first result and provides a second result that is the data if a condition is satisfied and that is the previous destination register value if not. The previous value of the destination register comprises a result produced by execution of a microinstruction that is the most recent in-order previous writer of the destination register with respect to the second microinstruction. | 01-09-2014 |
20140122843 | CONDITIONAL STORE INSTRUCTIONS IN AN OUT-OF-ORDER EXECUTION MICROPROCESSOR - An instruction translator translates a conditional store instruction (specifying data register, base register, and offset register of the register file) into at least two microinstructions. An out-of-order execution pipeline executes the microinstructions. To execute a first microinstruction, an execution unit receives a base value and an offset from the register file and generates a first result as a function of the base value and offset. The first result specifies the memory location address. To execute a second microinstruction, an execution unit receives the first result and writes the first result to an allocated entry in the store queue if the condition flags satisfy the condition (the store queue subsequently writes the data to the memory location specified by the address), and otherwise kills the allocated store queue entry so that the store queue does not write the data to the memory location specified by the address. | 05-01-2014 |
20140122847 | MICROPROCESSOR THAT TRANSLATES CONDITIONAL LOAD/STORE INSTRUCTIONS INTO VARIABLE NUMBER OF MICROINSTRUCTIONS - An instruction translator receives a conditional load/store instruction that specifies a condition, destination/data register, base register, offset source, and memory addressing mode. The instruction instructs the microprocessor to load data from a memory location into the destination register (conditional load) or store data to the memory location from the data register (conditional store) only if the condition flags satisfy the condition. The offset source specifies whether the offset is an immediate value or a value in an offset register. The addressing mode specifies whether the base register is updated when the condition flags satisfy the condition. The instruction translator translates the conditional load instruction into a number of microinstructions, which varies as a function of the offset source, addressing mode, and whether the conditional instruction is a conditional load or store instruction. An out-of-order execution pipeline executes the microinstructions to generate results specified by the instruction. | 05-01-2014 |
20140258641 | COMMUNICATING PREFETCHERS IN A MICROPROCESSOR - A microprocessor includes a first and second hardware data prefetchers configured to prefetch data into the microprocessor according to first and second respective algorithms, which are different. The second prefetcher is configured to detect a memory access pattern within a memory region and responsively prefetch data from the memory region according the second algorithm. The second prefetcher is further configured to provide to the first prefetcher a descriptor of the memory region. The first prefetcher is configured to stop prefetching data from the memory region in response to receiving the descriptor of the memory region from the second prefetcher. The second prefetcher also provides to the first prefetcher a communication to resume prefetching data from the memory region, such as when the second prefetcher subsequently detects that a predetermined number of memory accesses to the memory region are not in the memory access pattern. | 09-11-2014 |
20140289479 | BOUNDING BOX PREFETCHER - A data prefetcher in a microprocessor. The data prefetcher includes a plurality of period match counters associated with a corresponding plurality of different pattern periods. The data prefetcher also includes control logic that updates the plurality of period match counters in response to accesses to a memory block by the microprocessor, determines a clear pattern period based on the plurality of period match counters and prefetches into the microprocessor non-fetched cache lines within the memory block based on a pattern having the clear pattern period determined based on the plurality of period match counters. | 09-25-2014 |
20140297993 | UNCORE MICROCODE ROM - A microprocessor includes a plurality of processing cores each comprises a corresponding memory physically located inside the core and readable by the core but not readable by the other cores (“core memory”). The microprocessor also includes a memory physically located outside all of the cores and readable by all of the cores (“uncore memory”). For each core, the uncore memory and corresponding core memory collectively provide M words of storage for microcode instructions fetchable by the core as follows: the uncore memory provides J of the M words of microcode instruction storage, and the corresponding core memory provides K of the M words of microcode instruction storage. J, K and M are counting numbers, and M=J+K. The memories are non-architecturally-visible and accessed using a fetch address provided by a non-architectural program counter, and the microcode instructions are non-architectural instructions that implement architectural instructions. | 10-02-2014 |
20140298060 | ASYMMETRIC MULTI-CORE PROCESSOR WITH NATIVE SWITCHING MECHANISM - A processor includes first and second processing cores configured to support first and second respective subsets of features of its instruction set architecture (ISA) feature set. The first subset is less than all the features of the ISA feature set. The first and second subsets are different but their union is all the features of the ISA feature set. The first core detects a thread, while being executed by the first core rather than by the second core, attempted to employ a feature not in the first subset and, in response, to indicate a switch from the first core to the second core to execute the thread. The unsupported feature may be an unsupported instruction or operating mode. A switch may also be made if the lower performance/power core is being over-utilized or the higher performance/power core is being under-utilized. | 10-02-2014 |
20140310479 | COMMUNICATING PREFETCHERS THAT THROTTLE ONE ANOTHER - A microprocessor includes a first hardware data prefetcher that prefetches data into the microprocessor according to a first algorithm. The microprocessor also includes a second hardware data prefetcher that prefetches data into the microprocessor according to a second algorithm, wherein the first and second algorithms are different. The second prefetcher detects that it is prefetching data into the microprocessor according to the second algorithm in excess of a first predetermined rate and, in response, sends a throttle indication to the first prefetcher. The first prefetcher prefetches data into the microprocessor according to the first algorithm at below a second predetermined rate in response to receiving the throttle indication from the second prefetcher. | 10-16-2014 |
20140365753 | SELECTIVE ACCUMULATION AND USE OF PREDICTING UNIT HISTORY - A microprocessor includes a predicting unit and a control unit. The control unit controls the predicting unit to accumulate a history of characteristics of executed instructions and makes predictions related to subsequent instructions based on the history while the microprocessor is running a first thread. The control unit also detects a transition from running the first thread to running a second thread and controls the predicting unit to selectively suspend accumulating the history and making the predictions using the history while running the second thread. The predicting unit makes static predictions while running the second thread. The selectivity may be based on the privilege level, identity or length of the second thread, static prediction effectiveness during a previous execution instance of the thread, whether the transition was made due to a system call, and whether the second thread is an interrupt handler. | 12-11-2014 |
20150067301 | MICROPROCESSOR WITH BOOT INDICATOR THAT INDICATES A BOOT ISA OF THE MICROPROCESSOR AS EITHER THE X86 ISA OR THE ARM ISA - A microprocessor includes a plurality of registers that holds an architectural state of the microprocessor and an indicator that indicates a boot instruction set architecture (ISA) of the microprocessor as either the x86 ISA or the Advanced RISC Machines (ARM) ISA. The microprocessor also includes a hardware instruction translator that translates x86 ISA instructions and ARM ISA instructions into microinstructions. The hardware instruction translator translates, as instructions of the boot ISA, the initial ISA instructions that the microprocessor fetches from architectural memory space after receiving a reset signal. The microprocessor also includes an execution pipeline, coupled to the hardware instruction translator. The execution pipeline executes the microinstructions to generate results defined by the x86 ISA and ARM ISA instructions. In response to the reset signal, the microprocessor initializes its architectural state in the plurality of registers as defined by the boot ISA prior to fetching the initial ISA instructions. | 03-05-2015 |
20150067306 | INTER-CORE COMMUNICATION VIA UNCORE RAM - A microprocessor includes a plurality of processing cores and an uncore random access memory (RAM) readable and writable by each of the plurality of processing cores. Each core of the plurality of processing cores comprises microcode run by the core that implements architectural instructions of an instruction set architecture of the microprocessor. The microcode is configured to both read and write the uncore RAM to accomplish inter-core communication between the plurality of processing cores. | 03-05-2015 |
20150089204 | DYNAMICALLY RECONFIGURABLE MICROPROCESSOR - A microprocessor includes a plurality of dynamically reconfigurable functional units, a fingerprint, and a fingerprint unit. As the plurality of dynamically reconfigurable functional units execute instructions according to a first configuration setting, the fingerprint unit accumulates information about the instructions according to a mathematical operation to generate a result. The microprocessor also includes a reconfiguration unit that reconfigures the plurality of dynamically reconfigurable functional units to execute instructions according to a second configuration setting in response to an indication that the result matches the fingerprint. | 03-26-2015 |