Entries |
Document | Title | Date |
20080201528 | MEMORY ACCESS SYSTEMS FOR CONFIGURING WAYS AS CACHE OR DIRECTLY ADDRESSABLE MEMORY - A memory system is provided. A processor provides a data access address. A memory device includes a predetermined number of ways. The processor selectively configures a selected number less than or equal to the predetermined number of the ways as cache memory belonging to a cacheable region, and configures remaining ways as directly addressable memory belonging to a directly addressable region by memory configuration information. A memory controller determines the data access address corresponding to the cacheable region or the directly addressable region, selects only the way in the directly addressable region corresponding to the data access address when the data access address corresponds to the directly addressable region, and selects only the way(s) belonging to the cacheable region when the data access address corresponds to the cacheable region. A configuration controller monitors the status of the ways and adjusting the memory configuration information according to the status of the ways. | 08-21-2008 |
20080201529 | CONTEXT SWITCH DATA PREFETCHING IN MULTITHREADED COMPUTER - An apparatus, program product and method initiate, in connection with a context switch operation, a prefetch of data likely to be used by a thread prior to resuming execution of that thread. As a result, once it is known that a context switch will be performed to a particular thread, data may be prefetched on behalf of that thread so that when execution of the thread is resumed, more of the working state for the thread is likely to be cached, or at least in the process of being retrieved into cache memory, thus reducing cache-related performance penalties associated with context switching. | 08-21-2008 |
20080209127 | System and method for efficient implementation of software-managed cache - A system and method for an efficient implementation of a software-managed cache is presented. When an application thread executes on a simple processor, the application thread uses a conditional data select instruction for eliminating a conditional branch instruction when accessing a software-managed cache. An application thread issues a conditional data select instruction (DMA transfer) after a cache directory lookup, wherein the size of the requested data is dependent upon the outcome of the cache directory lookup. When the cache directory lookup results in a cache hit, the application thread requests a transfer of zero bits of data, which results in a DMA controller (DMAC) performing a no-op instruction. When the cache directory lookup results in a cache miss, the application thread requests a data block transfer the size of a corresponding cache line. | 08-28-2008 |
20080222360 | MULTI-PORT INTEGRATED CACHE - A multi-port instruction/data integrated cache which is provided between a parallel processor and a main memory and stores therein a part of instructions and data stored in the main memory has a plurality of banks, and a plurality of ports including an instruction port unit consisting of at least one instruction port used to access an instruction from the parallel processor and a data port unit consisting of at least one data port used to access data from the parallel processor. Further, a data width which can be specified to the bank from the instruction port is set larger than a data width which can be specified to the bank from the data port. | 09-11-2008 |
20080229021 | Systems and Methods of Revalidating Cached Objects in Parallel with Request for Object - The present solution provides a variety of techniques for accelerating and optimizing network traffic, such as HTTP based network traffic. The solution described herein provides techniques in the areas of proxy caching, protocol acceleration, domain name resolution acceleration as well as compression improvements. In some cases, the present solution provides various prefetching and/or prefreshening techniques to improve intermediary or proxy caching, such as HTTP proxy caching. In other cases, the present solution provides techniques for accelerating a protocol by improving the efficiency of obtaining and servicing data from an originating server to server to clients. In another cases, the present solution accelerates domain name resolution more quickly. As every HTTP access starts with a URL that includes a hostname that must be resolved via domain name resolution into an IP address, the present solution helps accelerate HTTP access. In some cases, the present solution improves compression techniques by prefetching non-cacheable and cacheable content to use for compressing network traffic, such as HTTP. The acceleration and optimization techniques described herein may be deployed on the client as a client agent or as part of a browser, as well as on any type and form of intermediary device, such as an appliance, proxying device or any type of interception caching and/or proxying device. | 09-18-2008 |
20080229022 | EFFICIENT SYSTEM BOOTSTRAP LOADING - An efficient system for bootstrap loading scans cache lines into a cache store queue during a scan phase, and then transmits the cache lines from the cache store queue to a cache memory array during a functional phase. Scan circuitry stores a given cache line in a set of latches associated with one of a plurality of cache entries in the cache store queue, and passes the cache line from the latch set to the associated cache entry. The cache lines may be scanned from test software that is external to the computer system. Read/claim dispatch logic dispatches store instructions for the cache entries to read/claim machines which write the cache lines to the cache memory array without obtaining write permission, after the read/claim machines evaluate a mode bit which indicates that cache entries in the cache store queue are scanned cache lines. In the illustrative embodiment the cache memory is an L2 cache. | 09-18-2008 |
20080250205 | STRUCTURE FOR SUPPORTING SIMULTANEOUS STORAGE OF TRACE AND STANDARD CACHE LINES - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for a single unified level one instruction cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instructions consistent with conventional cache lines is provided. A mechanism is described for indexing into the cache, and selecting the desired line. Control is exercised over which lines are contained within the cache. Provision is made for selection between a trace line and a conventional line when both match during a tag compare step. | 10-09-2008 |
20080250206 | STRUCTURE FOR USING BRANCH PREDICTION HEURISTICS FOR DETERMINATION OF TRACE FORMATION READINESS - A design structure embodied in a machine readable storage medium for designing, manufacturing, and/or testing a design for a single unified level one instruction(s) cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instruction(s) consistent with conventional cache lines is provided. Formation of trace lines in the cache is delayed on initial operation of the system to assure quality of the trace lines stored. | 10-09-2008 |
20080250207 | DESIGN STRUCTURE FOR CACHE MAINTENANCE - A single unified level one instruction cache in which some lines may contain traces and other lines in the same congruence class may contain blocks of instructions consistent with conventional cache lines. Control is exercised over which lines are contained within the cache. This invention avoids inefficiencies in the cache by removing trace lines experiencing early exits from the cache, or trace lines that are short, by maintaining a few bits of information about the accuracy of the control flow in a trace cache line and using that information in addition to the LRU (Least Recently Used) bits that maintain the recency information of a cache line, in order to make a replacement decision. | 10-09-2008 |
20080256300 | System and method for dynamically reconfiguring a vertex cache - A system to process a plurality of vertices to model an object. An embodiment of the system includes a processor, a front end unit coupled to the processor, and cache configuration logic coupled to the front end unit and the processor. The processor is configured to process the plurality of vertices. The front end unit is configured to communicate vertex data to the processor. The cache configuration logic is configured to establish a cache line size of a vertex cache based on a vertex size of a drawing command. | 10-16-2008 |
20080256301 | PROTECTION OF THE EXECUTION OF A PROGRAM - A method for controlling the execution of at least one program in an electronic circuit and a processor for executing a program, in which at least one volatile memory area of the circuit is, prior to the execution of the program to be controlled, filled with first instructions resulting in an exception processing; the program contains instructions for replacing all or part of the first instructions with second valid instructions; and the area is called for execution of all or part of the instruction that it contains at the end of the execution of the instruction program. | 10-16-2008 |
20080270702 | METHOD AND APPARATUS FOR AN EFFICIENT MULTI-PATH TRACE CACHE DESIGN - A novel trace cache design and organization to efficiently store and retrieve multi-path traces. A goal is to design a trace cache, which is capable of storing multi-path traces without significant duplication in the traces. Furthermore, the effective access latency of these traces is reduced. | 10-30-2008 |
20080276044 | PROCESSING APPARATUS - A processing apparatus which executes a program and performs processes of the program, includes an execution circuit including a plurality of central processing units, each having a respective cache memory, and each of the respective cache memories has an N-way set-associative structure with N-ways in which one line is made up of plural words. Each of the respective cache memories includes a data memory array which is simultaneously read-out in multiple-word-widths, and can be read-out using one of a type one read-out and a type two read-out. In the type one read-out, plural words in the same word positions within respective lines are simultaneously read-out from corresponding lines belonging to different ways, and in the type two read out, plural words making up one line of one way are simultaneously read-out. The cache memory has a first read-out mode and a second read-out mode. In the first read-out mode, a word belonging to a way which is hit by a memory access is selected from among the plural words read-out using the type-one read out, and the selected word is outputted, and in the second read-out mode, plural words are read-out from a way which is hit, using the type-two read out, and read-out plural words are outputted. | 11-06-2008 |
20080276045 | Apparatus and Method for Dynamic Cache Management - The apparatus of the present invention improves performance of computing systems by enabling a multi-core or multi-processor system to deterministically identify cache memory ( | 11-06-2008 |
20080282034 | Memory Subsystem having a Multipurpose Cache for a Stream Graphics Multiprocessor - A method and a computing system are provided. The computing system may include a system memory configured to store data in a first data format. The computing system may also include a computational core comprising a plurality of execution units (EU). The computational core may be configured to request data from the system memory and to process data in a second data format. Each of the plurality of EU may include an execution control and datapath and a specialized L1 cache pool. The computing system may include a multipurpose L2 cache in communication with the each of the plurality of EU and the system memory. The multipurpose L2 cache may be configured to store data in the first data format and the second data format. The computing system may also include an orthogonal data converter in communication with at least one of the plurality of EU and the system memory. | 11-13-2008 |
20080301369 | PROCESSING OF SELF-MODIFYING CODE IN MULTI-ADDRESS-SPACE AND MULTI-PROCESSOR SYSTEMS - A method and system of storing to an instruction stream with a multiprocessor or multiple-address-space system is disclosed. A central processing unit may cache instructions in a cache from a page of primary code stored in a memory storage unit. The central processing unit may execute cached instructions from the cache until a serialization operation is executed. The central processing unit may check in a message queue for a notification message indicating potential storing to the page. If the notification message is present in the message queue, cached instructions from the page are invalidated. | 12-04-2008 |
20090013131 | LOW POWER SEMI-TRACE INSTRUCTION CACHE - A semi-trace cache combines elements and features of an instruction cache and a trace cache. An ICache portion of the semi-trace cache is filled with instructions fetched from the next level of the memory hierarchy while a TCache portion is filled with traces gleaned either from the actual stream of retired instructions or predicted before execution. | 01-08-2009 |
20090013132 | Cache memory - A cache memory comprises a first set of storage locations for holding syllables and addressable by a first group of addresses; a second set of storage locations for holding syllables and addressable by a second group of addresses; addressing circuitry operable to provide in each addressing cycle a pair of addresses comprising one from the first group and one from the second group, thereby accessing a plurality of syllables from each set of storage locations; and selection circuitry operable to select from said plurality of syllables to output to a processor lane based on whether a required syllable is addressable by an address in the first or second group. | 01-08-2009 |
20090019225 | INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING SYSTEM - With respect to memory access instructions contained in an internal representation program, an information processing apparatus generates a load cache instruction, a cache hit judgment instruction, and a cache miss instruction that is executed in correspondence with a result of a judgment process performed according to the cache hit judgment instruction. In a case where the internal representation program contains a plurality of memory access instruction having a possibility of using mutually the same cache line in a cache memory when mutually different cache lines in a main memory are accessed, the information processing apparatus generates a combine instruction instructing that judgment results of the judgment processes that are performed according to the cache hit judgment instruction should be combined into one judgment result. The information processing apparatus outputs an output program that contains these instructions that have been generated. | 01-15-2009 |
20090037659 | Cache coloring method and apparatus based on function strength information - A method of performing cache coloring includes the steps of generating a dynamic function flow representing a temporal sequence in which a plurality of functions are called at a time of executing a program comprised of the plurality of functions by executing the program by a computer, generating function strength information in response to the dynamic function flow by use of the computer, the function strength information including information about runtime relationships between any given one of the plurality of functions and all the other ones of the plurality of functions in terms of a way the plurality of functions are called and further including information about degree of certainty of cache miss occurrence, and allocating the plurality of functions to memory space by use of the computer in response to the function strength information such as to reduce instruction cache conflict. | 02-05-2009 |
20090055592 | DIGITAL SIGNAL PROCESSOR CONTROL ARCHITECTURE - A system includes a control store memory populated with data path instructions indexable by control store addresses and jump addresses. The system further includes a control state machine to provide at least one control store address and at least one jump address to the control store memory, wherein the control store memory is configured to identify one or more data path instructions for both the control store address and the jump address. | 02-26-2009 |
20090063773 | Technique to enable store forwarding during long latency instruction execution - A technique to allow independent loads to be satisfied during high-latency instruction processing. Embodiments of the invention relate to a technique in which a storage structure is used to hold store operations in program order while independent load instructions are satisfied during a time in which a high-latency instruction is being processed. After the high-latency instruction is processed, the store operations can be restored in program order without searching the storage structure. | 03-05-2009 |
20090070531 | System and Method of Using An N-Way Cache - A system and method of using an n-way cache are disclosed. In an embodiment, a method includes determining a first way of a first instruction stored in a cache and storing the first way in a list of ways. The method also includes determining a second way of a second instruction stored in the cache and storing the second way in the list of ways. In an embodiment, the first way may be used to access a first cache line containing the first instruction and the second way may be used to access a second cache line containing the second instruction. | 03-12-2009 |
20090070532 | System and Method for Efficiently Testing Cache Congruence Classes During Processor Design Verification and Validation - A system and method for using a single test case to test each sector within multiple congruence classes is presented. A test case generator builds a test case for accessing each sector within a congruence class. Since a congruence class spans multiple congruence pages, the test case generator builds the test case over multiple congruence pages in order for the test case to test the entire congruence class. During design verification and validation, a test case executor modifies a congruence class identifier (e.g., patches a base register), which forces the test case to test a specific congruence class. By incrementing the congruence class identifier after each execution of the test case, the test case executor is able to test each congruence class in the cache using a single test case. | 03-12-2009 |
20090089507 | OVERLAY INSTRUCTION ACCESSING UNIT AND OVERLAY INSTRUCTION ACCESSING METHOD - The present invention provides an overlay instruction accessing unit and method, and a method and apparatus for compressing and storing a program. The overlay instruction accessing unit is used to execute a program stored in a memory in the form of a plurality of compressed program segments, and compresses: a buffer; a processing unit for issuing an instruction reading request, reading an instruction from the buffer, and executing the instruction; and a decompressing unit for reading a requested compressed instruction segment from the memory in response to the instruction reading request of the processing unit, decompressing the compressed instruction segment, and storing the decompressed instruction segment in the buffer, wherein while the processing unit is executing the instruction segment, the decompressing unit reads, according to a storage address of a compressed program segment to be invoked in a header corresponding to the instruction segment, a corresponding compressed instruction segment from the memory, decompresses the compressed instruction segment, and stores the decompressed instruction segment in the buffer for later use by the processing unit. | 04-02-2009 |
20090100228 | METHODS AND SYSTEMS FOR IMPLEMENTING A CACHE MODEL IN A PREFETCHING SYSTEM - The present invention relates to systems and methods of enhancing prefetch operations. The method includes fetching an object from a page on a web server. The method further includes storing, at a proxy server, caching instructions for the fetched object. The proxy server is connected with the client and the object is cached at the client. Furthermore, the method includes identifying a prefetchable reference to the fetched object in a subsequent web page and using the caching instructions stored on the proxy server to determine if a fresh copy of the object will be requested by the client. Further, the method includes, based on the determination that the object will be requested, sending a prefetch request for the object using an If-Modified-Since directive, and transmitting a response to the If-Modified-Since directive prefetch request to a proxy client. | 04-16-2009 |
20090119456 | PROCESSOR AND MEMORY CONTROL METHOD - A processor and a memory management method are provided. The processor includes a processor core, a cache which transceives data to/from the processor core via a single port, and stores the data accessed by the processor core, and a Scratch Pad Memory (SPM) which transceives the data to/from the processor core via at least one of a plurality of multi ports. | 05-07-2009 |
20090119457 | MULTITHREADED CLUSTERED MICROARCHITECTURE WITH DYNAMIC BACK-END ASSIGNMENT - A multithreaded clustered microarchitecture with dynamic back-end assignment is presented. A processing system may include a plurality of instruction caches and front-end units each to process an individual thread from a corresponding one of the instruction caches, a plurality of back-end units, and an interconnect network to couple the front-end and back-end units. A method may include measuring a performance metric of a back-end unit, comparing the measurement to a first value, and reassigning, or not, the back-end unit according to the comparison. Computer systems according to embodiments of the invention may include: a random access memory; a system bus; and a processor having a plurality of instruction caches, a plurality of front-end units each to process an individual thread from a corresponding one of the instruction caches; a plurality of back-end units; and an interconnect network coupled to the plurality of front-end units and the plurality of back-end units. | 05-07-2009 |
20090132766 | SYSTEMS AND METHODS FOR LOOKAHEAD INSTRUCTION FETCHING FOR PROCESSORS - Systems and methods may be provided for lookahead instruction fetching for processors. The systems and methods may include an L1 instruction cache, where the L1 instruction cache may include a plurality of lines of data, where each line of data may include one or more instructions. The systems and methods may also include a tagless hit instruction cache, where the tagless hit instruction cache may store a subset of the lines of data in the L1 instruction cache, where instructions in the lines of data stored in the tagless hit instruction cache may be stored with metadata indicative of whether a next instruction is guaranteed to reside in the tagless hit instruction cache, where an instruction fetcher may be arranged to have direct access to the L1 instruction cache and the tagless hit instruction cache, and where the tagless hit instruction cache may be arranged to have direct access to the L1 instruction cache. | 05-21-2009 |
20090144502 | Meta-Architecture Defined Programmable Instruction Fetch Functions Supporting Assembled Variable Length Instruction Processors - In an implementation, a processing system includes an instruction fetch (IF) memory storing IF instructions; an arithmetic/logic (AL) instruction memory (IMemory) storing AL instructions; and a programmable instruction fetch mechanism to generate IMemory instruction addresses, from IF instructions fetched from the IF memory, to select AL instructions to be fetched from the IMemory for execution, wherein at least one IF instruction includes a loop count field indicating a number of iterations of a loop to be performed, a loop start address of the loop, and a loop end address of the loop. | 06-04-2009 |
20090157967 | Pre-Fetch Data and Pre-Fetch Data Relative - A prefetch data machine instruction having an M field performs a function on a cache line of data specifying an address of an operand. The operation comprises either prefetching a cache line of data from memory to a cache or reducing the access ownership of store and fetch or fetch only of the cache line in the cache or a combination thereof. The address of the operand is either based on a register value or the program counter value pointing to the prefetch data machine instruction. | 06-18-2009 |
20090164729 | SYNC-ID FOR MULTIPLE CONCURRENT SYNC DEPENDENCIES IN AN OUT-OF-ORDER STORE QUEUE - A method, system and process for retiring data entries held within a store queue (STQ). The STQ of a processor cache is modified to receive and process multiple synchronized groups (sync-groups). Sync groups comprise thread of execution synchronized (thread-sync) entries, all thread of execution synchronized (all-thread-sync) entries, and regular store entries (non-thread-sync and non-all-thread-sync). The task of storing data entries, from the STQ out to memory or an input/output device, is modified to increase the effectiveness of the cache. Sync-groups are created for each thread and tracked within the STQ via a synchronized identification (SID). An entry is eligible for retirement when the entry is within a currently retiring sync-group as identified by the SID. | 06-25-2009 |
20090172284 | METHOD AND APPARATUS FOR MONITOR AND MWAIT IN A DISTRIBUTED CACHE ARCHITECTURE - A method and apparatus for monitor and mwait in a distributed cache architecture is disclosed. One embodiment includes an execution thread sending a MONITOR request for an address to a portion of a distributed cache that stores the data corresponding to that address. At the distributed cache portion the MONITOR request and an associated speculative state is recorded locally for the execution thread. The execution thread then issues an MWAIT instruction for the address. At the distributed cache portion the MWAIT and an associated wait-to-trigger state are recorded for the execution thread. When a write request matching the address is received at the distributed cache portion, a monitor-wake event is then sent to the execution thread and the associated monitor state at the distributed cache portion for that execution thread can be reset to idle. | 07-02-2009 |
20090172285 | TRACKING TEMPORAL USE ASSOCIATED WITH CACHE EVICTIONS - A method and apparatus for tracking temporal use associated with cache evictions to reduce allocations in a victim cache is disclosed. Access data for a number of sets of instructions in an instruction cache is tracked at least until the data for one or more of the sets reach a predetermined threshold condition. Determinations whether to allocate entry storage in the victim cache may be made responsive in part to the access data for sets reaching the predetermined threshold condition. A micro-operation can be inserted into the execution pipeline in part to synchronize the access data for all the sets. Upon retirement of the micro-operation from the execution pipeline, access data for the sets can be synchronized and/or any previously allocated entry storage in the victim cache can be invalidated. | 07-02-2009 |
20090177842 | DATA PROCESSING SYSTEM AND METHOD FOR PREFETCHING DATA AND/OR INSTRUCTIONS - A data processing system for processing at least one application is provided. The data processing system comprises a processor ( | 07-09-2009 |
20090187712 | Operation Frame Filtering, Building, And Execution - The present subject matter relates to operation frame filtering, building, and execution. Some embodiments include identifying a frame signature, counting a number of execution occurrences of the frame signature, and building a frame of operations to execute instead of operations identified by the frame signature. | 07-23-2009 |
20090198898 | PARALLEL DATA PROCESSING APPARATUS - A controller for controlling a data processor having a plurality of processor arrays, each of which includes a plurality of processing elements, comprises a retrieval unit operable to retrieve a plurality of incoming instructions streams in parallel with one another, and a distribution unit operable to supply such incoming instruction streams to respective ones of the said plurality of processor arrays. | 08-06-2009 |
20090204764 | Cache Pooling for Computing Systems - In a computing system a method and apparatus for cache pooling is introduced. Threads are assigned priorities based on the criticality of their tasks. The most critical threads are assigned to main memory locations such that they are subject to limited or no cache contention. Less critical threads are assigned to main memory locations such that their cache contention with critical threads is minimized or eliminated. Thus, overall system performance is improved, as critical threads execute in a substantially predictable manner. | 08-13-2009 |
20090210627 | METHOD AND SYSTEM FOR HANDLING CACHE COHERENCY FOR SELF-MODIFYING CODE - A method for handling cache coherency includes allocating a tag when a cache line is not exclusive in a data cache for a store operation, and sending the tag and an exclusive fetch for the line to coherency logic. An invalidation request is sent within a minimum amount of time to an I-cache, preferably only if it has fetched to the line and has not been invalidated since, which request includes an address to be invalidated, the tag, and an indicator specifying the line is for a PSC operation. The method further includes comparing the request address against stored addresses of prefetched instructions, and in response to a match, sending a match indicator and the tag to an LSU, within a maximum amount of time. The match indicator is timed, relative to exclusive data return, such that the LSU can discard prefetched instructions following execution of the store operation that stores to a line subject to an exclusive data return, and for which the match is indicated. | 08-20-2009 |
20090216952 | METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR MANAGING CACHE MEMORY - A method for managing cache memory including receiving an instruction fetch for an instruction stream in a cache memory, wherein the instruction fetch includes an instruction fetch reference tag for the instruction stream and the instruction stream is at least partially included within a cache line, comparing the instruction fetch reference tag to a previous instruction fetch reference tag, maintaining a cache replacement status of the cache line if the instruction fetch reference tag is the same as the previous instruction fetch reference tag, and upgrading the cache replacement status of the cache line if the instruction fetch reference tag is different from the previous instruction fetch reference tag, whereby the cache replacement status of the cache line is upgraded if the instruction stream is independently fetched more than once. A corresponding system and computer program product. | 08-27-2009 |
20090222624 | Method and Apparatus For Economical Cache Population - A technique for efficiently populating a cache in a data processing system with resources is disclosed. In particular, a node in accordance with the illustrative embodiment of the present invention defers populating its cache with a resource until at least two requests for the resource have been received. This is advantageous because it prevents the cache from being populated with infrequently requested resources. Furthermore, the illustrative embodiment of the present invention populates a cache with a resource only when: at least i requests for the resource have been received at a given node within an elapsed time interval, Δt, wherein i is an integer greater than one; and at least one request for the resource has been received from at least n of the m filial nodes of the given node within an elapsed time interval, Δt, wherein m is an integer greater than one, n is an integer greater than one, and m≧n. Embodiments of the present invention are particularly advantageous in computer networks that comprise a logical hierarchical topology, but are useful an any computer network, and in individual data processing systems and routers that comprise a cache memory. | 09-03-2009 |
20090240890 | PROGRAM THREAD SYNCRONIZATION - A barrier for synchronizing program threads for a plurality of processors includes a filter configured to be coupled to a plurality of processors executing a plurality of threads to be synchronized. The filter is configured to monitor and selectively block fill requests for instruction cache lines. A method for synchronizing program threads for a plurality of processors includes configuring a filter to monitor and selectively block fill requests for instruction cache lines for a plurality of processors executing a plurality of threads to be synchronized. | 09-24-2009 |
20090248985 | Data Transfer Optimized Software Cache for Regular Memory References - Mechanisms are provided for optimizing regular memory references in computer code. These mechanisms may parse the computer code to identify memory references in the computer code. These mechanisms may further classify the memory references in the computer code as either regular memory references or irregular memory references. Moreover, the mechanisms may transform the computer code, by a compiler, to generate transformed computer code in which regular memory references access a storage of a software cache of a data processing system through a high locality cache mechanism of the software cache. | 10-01-2009 |
20090276576 | Methods and Apparatus storing expanded width instructions in a VLIW memory for deferred execution - Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution arc addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units. | 11-05-2009 |
20100023696 | Methods and System for Resolving Simultaneous Predicted Branch Instructions - A method of resolving simultaneous branch predictions prior to validation of the predicted branch instruction is disclosed. The method includes processing two or more predicted branch instructions, with each predicted branch instruction having a predicted state and a corrected state. The method further includes selecting one of the corrected states. Should one of the predicted branch instructions be mispredicted, the selected corrected state is used to direct future instruction fetches. | 01-28-2010 |
20100030967 | METHOD AND SYSTEM FOR SECURING INSTRUCTION CACHES USING SUBSTANTIALLY RANDOM INSTRUCTION MAPPING SCHEME - A method and system is provided for securing micro-architectural instruction caches (I-caches). Securing an I-cache involves maintaining a different substantially random instruction mapping policy into an I-cache for each of multiple processes, and for each process, performing a substantially random mapping scheme for mapping a process instruction into the I-cache based on the substantially random instruction mapping policy for said process. Securing the I-cache may further involve dynamically partitioning the I-cache into multiple logical partitions, and sharing access to the I-cache by an I-cache mapping policy that provides access to each I-cache partition by only one logical processor. | 02-04-2010 |
20100030968 | Methods of Cache Bounded Reference Counting - A computer implemented method of cache bounded reference counting for computer languages having automated memory management in which, for example, a reference to an object “Z” initially stored in an object “O” is fetched and the cache hardware is queried whether the reference to the object “Z” is a valid reference, is in the cache, and has a continuity flag set to “on”. If so, the object “O” is locked for an update, a reference counter is decremented for the object “Z” if the object “Z” resides in the cache, and a return code is set to zero to indicate that the object “Z” is de-referenced and that its storage memory can be released and re-used if the reference counter for the object “Z” reaches zero. Thereafter, the cache hardware is similarly queried regarding an object “N” that will become a new reference of object “O”. | 02-04-2010 |
20100064106 | DATA PROCESSOR AND DATA PROCESSING SYSTEM - The present invention provides a data processor capable of automatically discriminating a loop program and performing a reduction in power by size-variable lock control on an instruction buffer. The instruction buffer of the data processor includes a buffer controller for controlling a memory unit that stores each fetched instruction therein. When an execution history of a fetched condition branch instruction suggests condition establishment, and in the case that the branch direction of the fetched condition branch instruction is a direction opposite to the order of an instruction execution and the difference of instruction addresses from the branch source to the branch target based on the condition branch instruction is a range held in the storage capacity of the instruction buffer, the buffer controller retains an instruction sequence from a branch source to a branch target based on the condition branch instruction in the instruction buffer. While the instruction execution of the instruction sequence retained therein is repeated, the buffer controller supplies the corresponding instruction of the instruction sequence from the instruction buffer to the instruction decoder and releases retention of the instruction sequence when the instruction execution is exited from the instruction sequence. | 03-11-2010 |
20100070713 | DEVICE AND METHOD FOR FETCHING INSTRUCTIONS - A device and a method for fetching instructions. The device includes a processor adapted to execute instructions; a high level memory unit adapted to store instructions; a direct memory access (DMA) controller that is controlled by the processor; an instruction cache that includes a first input port and a second input port; wherein the instruction cache is adapted to provide instructions to the processor in response to read requests that are generated by the processor and received via the first input port; wherein the instruction cache is further adapted to fetch instructions from a high level memory unit in response to read requests, generated by the DMA controller and received via the second input port. | 03-18-2010 |
20100077145 | METHOD AND SYSTEM FOR PARALLEL EXECUTION OF MEMORY INSTRUCTIONS IN AN IN-ORDER PROCESSOR - A method of parallel execution of a first and a second instruction in an in-order processor. Embodiments of the invention enable parallel execution of memory instructions that are stalled by cache memory misses. The in-order processor processes cache memory misses of instructions in parallel by overlapping the first cache memory miss with cache memory misses that occur after the first cache memory miss. Memory-level parallelism in the in-order processor can be increased when more parallel and outstanding cache memory misses are generated. | 03-25-2010 |
20100077146 | INSTRUCTION CACHE SYSTEM, INSTRUCTION-CACHE-SYSTEM CONTROL METHOD, AND INFORMATION PROCESSING APPARATUS - An instruction cache system includes an instruction-cache data storage unit that stores cache data per index, and an instruction cache controller that compresses and writes the cache data in the instruction-cache data storage unit, and controls a compression ratio of the written cache data. The instruction cache controller calculates a memory capacity of a redundant area generated due to compression in a memory area belonging to an index, in which n pieces of cache data are written based on the controlled compression ratio, to compress and write new cache data in the redundant area based on the calculated memory capacity. | 03-25-2010 |
20100095066 | METHOD FOR CACHING RESOURCE REPRESENTATIONS IN A CONTEXTUAL ADDRESS SPACE - A method to generate and save a resource representation recited by a request encoded in a computer algorithm, wherein the method receives from a requesting algorithm an Unresolved resource request. The method resolves the resource request to an endpoint and evaluates the resolved resource request by the endpoint to generate a resource representation. The method further generates and saves in a cache at least one Unresolved request scope key, a resolved request scope key, and a cache entry comprising the resource representation. The method associates the cache entry with the resolved request scope key and with the at least one Unresolved request scope key using a mapping function encoded in the cache. | 04-15-2010 |
20100122034 | STORAGE ARRAY TILE SUPPORTING SYSTOLIC MOVEMENT OPERATIONS - A tile for use in a tiled storage array provides re-organization of values within the tile array without requiring sophisticated global control. The tiles operate to move a requested value to a front-most storage element of the tile array according to a global systolic clock. The previous occupant of the front-most location is moved or swapped backward according to the systolic clock, and the new occupant is moved forward according to the systolic clock, according to the operation of the tiles, while providing for multiple in-flight access requests within the tile array. The placement heuristic that moves the values is determined according to the position of the tiles within the array and the behavior of the tiles. The movement of the values can be performed via only next-neighbor connections of adjacent tiles within the tile array. | 05-13-2010 |
20100138609 | Technique for controlling computing resources - A technique to enable resource allocation optimization within a computer system. In one embodiment, a gradient partition algorithm (GPA) module is used to continually measure performance and adjust allocation to shared resources among a plurality of data classes in order to achieve optimal performance. | 06-03-2010 |
20100138610 | METHOD AND APPARATUS FOR INCLUSION OF TLB ENTRIES IN A MICRO-OP CACHE OF A PROCESSOR - Methods and apparatus for inclusion of TLB (translation look-aside buffer) in processor micro-op caches are disclosed. Some embodiments for inclusion of TLB entries have micro-op cache inclusion fields, which are set responsive to accessing the TLB entry. Inclusion logic may the flush the micro-op cache or portions of the micro-op cache and clear corresponding inclusion fields responsive to a replacement or invalidation of a TLB entry whenever its associated inclusion field had been set. Front-end processor state may also be cleared and instructions refetched when replacement resulted from a TLB miss. | 06-03-2010 |
20100138611 | METHOD AND APPARATUS FOR PIPELINE INCLUSION AND INSTRUCTION RESTARTS IN A MICRO-OP CACHE OF A PROCESSOR - Methods and apparatus for instruction restarts and inclusion in processor micro-op caches are disclosed. Embodiments of micro-op caches have way storage fields to record the instruction-cache ways storing corresponding macroinstructions. Instruction-cache in-use indications associated with the instruction-cache lines storing the instructions are updated upon micro-op cache hits. In-use indications can be located using the recorded instruction-cache ways in micro-op cache lines. Victim-cache deallocation micro-ops are enqueued in a micro-op queue after micro-op cache miss synchronizations, responsive to evictions from the instruction-cache into a victim-cache. Inclusion logic also locates and evicts micro-op cache lines corresponding to the recorded instruction-cache ways, responsive to evictions from the instruction-cache. | 06-03-2010 |
20100153648 | Block Driven Computation Using A Caching Policy Specified In An Operand Data Structure - A processor has an associated memory hierarchy including a cache memory. The processor includes an instruction sequencing unit that fetches instructions for processing, an operand data structure including a plurality of entries corresponding to operands of operations to be performed by the processor, and a computation engine. A first entry among the plurality of entries in the operand data structure specifies a first caching policy for a first operand, and a second entry specifies a second caching policy for a second operand. The computation engine computes and stores operands in the memory hierarchy in accordance with the cache policies indicated within the operand data structure. | 06-17-2010 |
20100169577 | Cache control device and control method - In order to control an access request to the cache shared between a plurality of threads, a storage unit for storing a flag provided in association with each of the threads is included. If the threads enter the execution of an atomic instruction, a defined value is written to the flags stored in the storage unit. Furthermore, if the atomic instruction is completed, a defined value different from the above defined value is written, thereby displaying whether or not the threads are executing the atomic instruction. If an access request is issued from a certain thread, it is judged whether or not a thread different from the certain thread is executing the atomic instruction by referencing the flag values in the storage unit. If it is judged that another thread is executing the atomic instruction, the access request is kept standby. This makes it possible to realize the exclusive control processing necessary for processing the atomic instruction according to simple configuration. | 07-01-2010 |
20100205376 | METHOD FOR THE IMPROVEMENT OF MICROPROCESSOR SECURITY - A method for the improvement of the security of microprocessors ( | 08-12-2010 |
20100228920 | PARALLEL PROCESSING PROCESSOR SYSTEM - A parallel processing processor system includes multiple processor elements, a main memory, and a shared memory, whose latency with the processors is less than the latency between the main memory and the processors. Each of the multiple processor elements has a DSP (Digital Signal Processor) and an instruction cache. Firmware executed by the DSPs is transferred from the main memory to the shared memory and is shared by the DSPs. Updating of the instruction caches in the case where a cache miss has occurred is performed by, for example, copying, into the instruction caches, the content of the shared memory corresponding to an address accessed by a DSP. | 09-09-2010 |
20100235579 | Cache Management Within A Data Processing Apparatus - A data processing apparatus, and method of managing at least one cache within such an apparatus, are provided. The data processing apparatus has at least one processing unit for executing a sequence of instructions, with each such processing unit having a cache associated therewith, each cache having a plurality of cache lines for storing data values for access by the associated processing unit when executing the sequence of instructions. Identification logic is provided which, for each cache, monitors data traffic within the data processing apparatus and based thereon generates a preferred for eviction identification identifying one or more of the data values as preferred for eviction. Cache maintenance logic is then arranged, for each cache, to implement a cache maintenance operation during which selection of one or more data values for eviction from that cache is performed having regard to any preferred for eviction identification generated by the identification logic for data values stored in that cache. It has been found that such an approach provides a very flexible technique for seeking to improve cache storage utilisation. | 09-16-2010 |
20100250854 | METHOD AND SYSTEM FOR DATA PREFETCHING FOR LOOPS BASED ON LINEAR INDUCTION EXPRESSIONS - An efficient and effective compiler data prefetching technique is disclosed in which memory accesses may be prefetched are represented in linear induction expressions. Furthermore, indirect memory accesses indexed by other memory accesses of linear induction expressions in scalar loops may be prefetched. | 09-30-2010 |
20100268887 | INFORMATION HANDLING SYSTEM WITH IMMEDIATE SCHEDULING OF LOAD OPERATIONS IN A DUAL-BANK CACHE WITH DUAL DISPATCH INTO WRITE/READ DATA FLOW - An information handling system (IHS) includes a processor with a cache memory system. The processor includes a processor core with an L1 cache memory that couples to an L2 cache memory. The processor includes an arbitration mechanism that controls load and store requests to the L2 cache memory. The arbitration mechanism includes control logic that enables a load request to interrupt a store request that the L2 cache memory is currently servicing. The L2 cache memory includes dual data banks so that one bank may perform a load operation while the other bank performs a store operation. The cache system provides dual dispatch points into the data flow to the dual cache banks of the L2 cache memory. | 10-21-2010 |
20100268888 | PROCESSING A DATA STREAM BY ACCESSING ONE OR MORE HARDWARE REGISTERS - Disclosed are a method, a system, and a program product for processing a data stream by accessing one or more hardware registers of a processor. In one or more embodiments, a first program instruction or subroutine can associate a hardware register of the processor with a data stream. With this association, the hardware register can be used as a stream head which can be used by multiple program instructions to access the data stream. In one or more embodiments, data from the data stream can be fetched automatically as needed and with one or more patterns which may include one or more start positions, one or more lengths, one or more strides, etc. to allow the cache to be populated with sufficient amounts of data to reduce memory latency and/or external memory bandwidth when executing an application which accesses the data stream through the one or more registers. | 10-21-2010 |
20100274972 | SYSTEMS, METHODS, AND APPARATUSES FOR PARALLEL COMPUTING - Systems, methods, and apparatuses for parallel computing are described. In some embodiments, a processor is described that includes a front end and back end. The front includes an instruction cache to store instructions of a strand. The back end includes a scheduler, register file, and execution resources to execution the strand's instructions. | 10-28-2010 |
20100281218 | INTELLIGENT CACHE REPLACEMENT MECHANISM WITH VARYING AND ADAPTIVE TEMPORAL RESIDENCY REQUIREMENTS - A method for replacing cached data is disclosed. The method in one aspect associates an importance value to each block of data in the cache. When a new entry needs to be stored in the cache, a cache block for replacing is selected based on the importance values associated with cache blocks. In another aspect, the importance values are set according to the hardware and/or software's knowledge of the memory access patterns. The method in one aspect may also include varying the importance value over time over different processing requirements. | 11-04-2010 |
20100299483 | EARLY RELEASE OF CACHE DATA WITH START/END MARKS WHEN INSTRUCTIONS ARE ONLY PARTIALLY PRESENT - An apparatus extracts instructions from a stream of undifferentiated instruction bytes in a microprocessor having an instruction set architecture in which the instructions are variable length. Decoders generate an associated start/end mark for each instruction byte of a line from a first queue of entries each storing a line of instruction bytes. A second queue has entries each storing a line received from the first queue along with the associated start/end marks. Control logic detects a condition where the length of an instruction whose initial portion within a first line in the first queue is yet undeterminable because the instruction's remainder resides in a second line yet to be loaded into the first queue from the instruction cache; loads the first line and corresponding start/end marks into the second queue and refrains from shifting the first line out of the first queue, in response to detecting the condition; and extracts instructions from the first line in the second queue based on the corresponding start/end marks. The instructions exclude the yet undeterminable length instruction. | 11-25-2010 |
20100299484 | LOW POWER HIGH SPEED LOAD-STORE COLLISION DETECTOR - An apparatus detects a load-store collision within a microprocessor between a load operation and an older store operation each of which accesses data in the same cache line. Load and store byte masks specify which bytes contain the data specified by the load and store operation within a word of the cache line in which the load and data begins, respectively. Load and store word masks specify which words contain the data specified by the load and store operations within the cache line, respectively. Combinatorial logic uses the load and store byte masks to detect the load-store collision if the data specified by the load and store operations begin in the same cache line word, and uses the load and store word masks to detect the load-store collision if the data specified by the load and store operations do not begin in the same cache line word. | 11-25-2010 |
20100306476 | CONTROLLING SIMULATION OF A MICROPROCESSOR INSTRUCTION FETCH UNIT THROUGH MANIPULATION OF INSTRUCTION ADDRESSES - Instruction fetch unit (IFU) verification is improved by dynamically monitoring the current state of the IFU model and detecting any predetermined states of interest. The instruction address sequence is automatically modified to force a selected address to be fetched next by the IFU model. The instruction address sequence may be modified by inserting one or more new instruction addresses, or by jumping to a non-sequential address in the instruction address sequence. In exemplary implementations, the selected address is a corresponding address for an existing instruction already loaded in the IFU cache, or differs only in a specific field from such an address. The instruction address control is preferably accomplished without violating any rules of the processor architecture by sending a flush signal to the IFU model and overwriting an address register corresponding to a next address to be fetched. | 12-02-2010 |
20100325359 | TRACING OF DATA FLOW - Embodiments for tracing dataflow for a computer program are described. The computer program includes machine instructions that are executable on a microprocessor. A decoding module can be configured to decode machine instructions obtained from a computer memory. In addition, a dataflow primitive engine can receive a decoded machine instruction from the decoding module and generate at least one dataflow primitive for the decoded machine instruction based on a dataflow primitive classification into which the decoded machine instruction are categorized by the dataflow primitive engine. A dataflow state table can be configured to track addressed data locations that are affected by dataflow. The dataflow primitives can be applied to the dataflow state table to update a dataflow status for the addressed data locations affected by the decoded machine instruction. | 12-23-2010 |
20100325360 | MULTI-CORE PROCESSOR AND MULTI-CORE PROCESSOR SYSTEM - In one embodiment, a multi-core processor includes a plurality of processor cores that each includes a cache and that uses a management target area allocated as a main memory in a memory area. The multi-core processor includes a state managing unit and a management-target-area. The state managing unit manages a first state in which the small area is not allocated to the processor core and a second state in which the small area is allocated to the processor core for each small area included in the management target area. The management-target-area increasing and decreasing unit increases and decreases the management target area by increasing and decreasing the small area in the first state in the management target area. | 12-23-2010 |
20100332757 | Integrated Pipeline Write Hazard Handling Using Memory Attributes - A system and method for write hazard handling are described, including a method comprising pre-computing a memory management unit policy for a write request using an address that is at least one clock cycle before data; registering the pre-computed memory management unit policy; using the pre-computed memory management unit policy to control a pipeline stall to ensure that non-bufferable writes are pipeline-protected, ensuring that no non-bufferable locations are bypassed from within the pipeline and all subsequent non-bufferable reads will get data from a final destination; bypassing a read request only after a corresponding write request is updated an write pending buffer; decoding the write request with the write request aligned to data; registering the write request in the write pending buffer; allowing arbitration logic to force the pipeline stall for a region that will have a write conflict; and stalling read requests to protect against write hazards. | 12-30-2010 |
20100332758 | Cache memory device, processor, and processing method - A cache memory device includes: a data memory storing data written by an arithmetic processing unit; a connecting unit connecting an input path from the arithmetic processing unit to the data memory and an output path from the data memory to a main storage unit; a selecting unit provided on the output path to select data from the data memory or data from the arithmetic processing unit via the connecting unit, and to transfer the selected data to the output path; and a control unit controlling the selecting unit such that the data from the data memory is transferred to the output path when the data is written from the data memory to the main storage unit, and such that the data is transferred to the output path via the connecting unit when the data is written from the arithmetic processing unit to the main storage unit. | 12-30-2010 |
20100332759 | METHOD OF PROGRAM OBFUSCATION AND PROCESSING DEVICE FOR EXECUTING OBFUSCATED PROGRAMS - A program is obfuscated by reordering its instructions. Original instruction addresses are mapped to target addresses. A cache efficient obfuscated program is realized by restricting target addresses of a sequence of instructions to a limited set of the disjoint ranges ( | 12-30-2010 |
20100332760 | MECHANISM TO HANDLE EVENTS IN A MACHINE WITH ISOLATED EXECUTION - A platform and method for secure handling of events in an isolated environment. A processor executing in isolated execution “IsoX” mode may leak data when an event occurs as a result of the event being handled in a traditional manner based on the exception vector. By defining a class of events to be handled in IsoX mode, and switching between a normal memory map and an IsoX memory map dynamically in response to receipt of an event of the class, data security may be maintained in the face of such events. | 12-30-2010 |
20110029735 | METHOD FOR MANAGING AN EMBEDDED SYSTEM TO ENHANCE PERFORMANCE THEREOF, AND ASSOCIATED EMBEDDED SYSTEM - A method for managing an embedded system is provided. The method includes selecting one of a first memory and a second memory according to at least one criterion, where the selected memory is a source from which the embedded system reads commands of a program, and an access speed of the first memory is different from that of the second memory; and controlling the embedded system to execute the program by utilizing the selected memory as the source. | 02-03-2011 |
20110035551 | MICROPROCESSOR WITH REPEAT PREFETCH INDIRECT INSTRUCTION - A microprocessor includes an instruction decoder for decoding a repeat prefetch indirect instruction that includes address operands used to calculate an address of a first entry in a prefetch table having a plurality of entries, each including a prefetch address. The repeat prefetch indirect instruction also includes a count specifying a number of cache lines to be prefetched. The memory address of each of the cache lines is specified by the prefetch address in one of the entries in the prefetch table. A count register, initially loaded with the count specified in the prefetch instruction, stores a remaining count of the cache lines to be prefetched. Control logic fetches the prefetch addresses of the cache lines from the table into the microprocessor and prefetches the cache lines from the system memory into a cache memory of the microprocessor using the count register and the prefetch addresses fetched from the table. | 02-10-2011 |
20110040939 | MICROPROCESSOR WITH INTEGRATED HIGH SPEED MEMORY - The present invention relates to the field of (micro)computer design and architecture, and in particular to microarchitecture associated with moving data values between a (micro)processor and memory components. Particularly, the present invention relates to a computer system with an processor architecture in which register addresses are generated with more than one execution channel controlled by one central processing unit with at least one load/store unit for loading and storing data objects, and at least one cache memory associated to the processor holding data objects accessed by the processor, wherein said processor's load/store unit contains a high speed memory directly interfacing said load/store unit to the cache. The present invention improves the of architectures with dual ported microprocessor implementations comprising two execution pipelines capable of two load/store data transactions per cycle. By including a cache memory inside the load/store unit, the processor is directly interfaced from its load/store units to the caches. Thus, the present invention accelerates data accesses and transactions from and to the load/store units of the processor and the data cache memory. | 02-17-2011 |
20110047333 | ALLOCATING PROCESSOR CORES WITH CACHE MEMORY ASSOCIATIVITY - Techniques are generally described related to a multi-core processor with a plurality of processor cores and a cache memory shared by at least some of the processor cores. The multi-core processor can be configured for separately allocating a respective level of cache memory associativity to each of the processing cores. | 02-24-2011 |
20110055483 | TRANSACTIONAL MEMORY SYSTEM WITH EFFICIENT CACHE SUPPORT - A computer implemented method for use by a transaction program for managing memory access to a shared memory location for transaction data of a first thread, the shared memory location being accessible by the first thread and a second thread. A string of instructions to complete a transaction of the first thread are executed, beginning with one instruction of the string of instructions. It is determined whether the one instruction is part of an active atomic instruction group (AIG) of instructions associated with the transaction of the first thread. A cache structure and a transaction table which together provide for entries in an active mode for the AIG are located if the one instruction is part of an active AIG. The next instruction is executed under a normal execution mode in response to determining that the one instruction is not part of an active AIG. | 03-03-2011 |
20110055484 | Detecting Task Complete Dependencies Using Underlying Speculative Multi-Threading Hardware - Mechanisms are provided for tracking dependencies of threads in a multi-threaded computer program execution. The mechanisms detect a dependency of a first thread's execution on results of a second thread's execution in an execution flow of the multi-threaded computer program. The mechanisms further store, in a hardware thread dependency vector storage associated with the first thread's execution, an identifier of the dependency by setting at least one bit in the hardware thread dependency vector storage corresponding to the second thread. Moreover, the mechanisms schedule tasks performed by the multi-threaded computer program based on the hardware thread dependency vector storage to minimize squashing of threads. | 03-03-2011 |
20110072215 | Cache system and control method of way prediction for cache memory - A cache device according to an exemplary aspect of the present invention includes a way information buffer that stores way information that is a result of selecting a way in an instruction that accesses a cache memory; and a control unit that controls a storage processing and a read processing, while a series of instruction groups are repeatedly executed, the storage processing being for storing the way information in the instruction groups to the way information memory, the read processing being for reading the way information from the way information memory. | 03-24-2011 |
20110093658 | CLASSIFYING AND SEGREGATING BRANCH TARGETS - A system and method for branch prediction in a microprocessor. A branch prediction unit stores an indication of a location of a branch target instruction relative to its corresponding branch instruction. For example, a target instruction may be located within a first region of memory as a branch instruction. Alternatively, the target instruction may be located outside the first region, but within a larger second region. The prediction unit comprises a branch target array corresponding to each region. Each array stores a bit range of a branch target address, wherein the stored bit range is based upon the location of the target instruction relative to the branch instruction. The prediction unit constructs a predicted branch target address by concatenating a bits stored in the branch target arrays. | 04-21-2011 |
20110138126 | Atomic Commit Predicated on Consistency of Watches - Mechanisms for performing predicated atomic commits based on consistency of watches is provided. These mechanisms include executing, by a thread executing on a processor of the data processing system, an atomic release instruction. A determination is made as to whether a speculative store has been lost, due to an eviction of a memory block to which the speculative store is performed, since a previous atomic release instruction was processed. In response to the speculative store having been lost, invalidating, by the processor, speculative stores that have been performed since the previous atomic release instruction was processed. In addition, the method comprises, in response to the speculative store not having been lost, committing, by the processor, speculative stores that have been performed since the previous atomic release instruction was processed. | 06-09-2011 |
20110145502 | META-DATA BASED DATA PREFETCHING - A technique for prefetching data into a cache memory system includes prefetching data based on meta information indicative of data access patterns. A method includes tagging data of a program with meta information indicative of data access patterns. The method includes prefetching the data from main memory at least partially based on the meta information, by a processor executing the program. In at least one embodiment, the method includes generating an executable at least partially based on the meta information. The executable includes at least one instruction to prefetch the data. In at least one embodiment, the method includes inserting one or more instructions for prefetching the data into an intermediate form of program code while translating program source code into the intermediate form of program code. | 06-16-2011 |
20110145503 | ON-LINE OPTIMIZATION OF SOFTWARE INSTRUCTION CACHE - A method for computing includes executing a program, including multiple cacheable lines of executable code, on a processor having a software-managed cache. A run-time cache management routine running on the processor is used to assemble a profile of inter-line jumps occurring in the software-managed cache while executing the program. Based on the profile, an optimized layout of the lines in the code is computed, and the lines of the program are re-ordered in accordance with the optimized layout while continuing to execute the program. | 06-16-2011 |
20110153947 | INFORMATON PROCESSING DEVICE AND INFORMATION PROCESSING METHOD - In an information processing device for processing VLIW includes memory banks, a memory banks are used to store an instruction word group constituting a very-long instruction. A program counter outputs an instruction address indicating a head memory bank containing a head part of the very long instruction of the next cycle. A memory bank control device uses information regarding the instruction address for the very long instruction and the number of memory banks associated with the very long instruction to specify the use memory bank to be used in the next cycle and the nonuse memory bank not to be used in the next cycle. The memory bank control device controls the operation of the nonuse memory bank. The instruction decoder decodes the very long instruction fetched from the use memory bank. An arithmetic device executes the decoded very long instruction. | 06-23-2011 |
20110161592 | Dynamic system reconfiguration - In some embodiments system reconfiguration code and data to be used to perform a dynamic hardware reconfiguration of a system including a plurality of processor cores is cached and any direct or indirect memory accesses during the dynamic hardware reconfiguration are prevented. One of the processor cores executes the cached system reconfiguration code and data in order to dynamically reconfigure the hardware. Other embodiments are described and claimed. | 06-30-2011 |
20110161593 | Cache unit, arithmetic processing unit, and information processing unit - A cache unit comprising a register file that selects an entry indicated by a cache index of n bits (n is a natural number) that is used to search for an instruction cache tag, using multiplexer groups having n stages respectively corresponding to the n bits of the cache index. Among the multiplexer groups having n stages, a multiplexer group in an m | 06-30-2011 |
20110161594 | INFORMATION PROCESSING DEVICE AND CACHE MEMORY CONTROL DEVICE - An information processor includes processing units each processes an out-of-order memory access and includes a cache memory, an instruction port that holds instructions for accessing data in the cache memory, a first determinator that validates a first flag when a request for invalidating cache data is received after a target data of a load instruction is transferred from the cache memory and a load instruction having a cache index identical to that of a target address of the received invalidating instruction exists, a second determinator that validates a second flag when the target data of the load instruction in the instruction port is transferred after a cache miss of the target data occurred, and a re-execution determinator that instructs to re-execute an instruction that follows the load instruction if the first and the second flags are valid when a load instruction in the instruction port has been completed. | 06-30-2011 |
20110173394 | ORDERING OF GUARDED AND UNGUARDED STORES FOR NO-SYNC I/O - A parallel computing system processes at least one store instruction. A first processor core issues a store instruction. A first queue, associated with the first processor core, stores the store instruction. A second queue, associated with a first local cache memory device of the first processor core, stores the store instruction. The first processor core updates first data in the first local cache memory device according to the store instruction. The third queue, associated with at least one shared cache memory device, stores the store instruction. The first processor core invalidates second data, associated with the store instruction, in the at least one shared cache memory. The first processor core invalidates third data, associated with the store instruction, in other local cache memory devices of other processor cores. The first processor core flushing only the first queue. | 07-14-2011 |
20110179226 | DATA PROCESSOR - The present invention provides a data processor capable of reducing power consumption at the time of execution of a spin wait loop for a spinlock. A CPU executes a weighted load instruction at the time of performing a spinlock process and outputs a spin wait request to a corresponding cache memory. When the spin wait request is received from the CPU, the cache memory temporarily stops outputting an acknowledge response to a read request from the CPU until a predetermined condition (snoop write hit, interrupt request, or lapse of predetermined time) is satisfied. Therefore, pipeline execution of the CPU is stalled and the operation of the CPU and the cache memory can be temporarily stopped, and power consumption at the time of executing a spin wait loop can be reduced. | 07-21-2011 |
20110213932 | DECODING APPARATUS AND DECODING METHOD - A decoding apparatus is used which includes: a CABAC processing unit ( | 09-01-2011 |
20110238917 | HIGH-PERFORMANCE CACHE SYSTEM AND METHOD - A digital system is provided for high-performance cache systems. The digital system includes a processor core and a cache control unit. The processor core is capable of being coupled to a first memory containing executable instructions and a second memory with a faster speed than the first memory. Further, the processor core is configured to execute one or more instructions of the executable instructions from the second memory. The cache control unit is configured to be couple to the first memory, the second memory, and the processor core to fill at least the one or more instructions from the first memory to the second memory before the processor core executes the one or more instructions, Further, the cache control unit is also configured to examine instructions being filled from the first memory to the second memory to extract instruction information containing at least branch information, to create a plurality of tracks based on the extracted instruction information; and to fill the at least one or more instructions based on one or more tracks from the plurality of instruction tracks, | 09-29-2011 |
20110264862 | REDUCING PIPELINE RESTART PENALTY - Techniques are disclosed relating to reducing the latency of restarting a pipeline in a processor that implements scouting. In one embodiment, the processor may reduce pipeline restart latency using two instruction fetch units that are configured to fetch and re-fetch instructions in parallel with one another. In some embodiments, the processor may reduce pipeline restart latency by initiating re-fetching instructions in response to determining that a commit operation is to be attempted with respect to one or more deferred instructions. In other embodiments, the processor may reduce pipeline restart latency by initiating re-fetching instructions in response to receiving an indication that a request for a set of data has been received by a cache, where the indication is sent by the cache before determining whether the data is present in the cache or not. | 10-27-2011 |
20110264863 | DEVICE, SYSTEM, AND METHOD FOR USING A MASK REGISTER TO TRACK PROGRESS OF GATHERING ELEMENTS FROM MEMORY - A device, system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed. | 10-27-2011 |
20110276764 | CRACKING DESTRUCTIVELY OVERLAPPING OPERANDS IN VARIABLE LENGTH INSTRUCTIONS - A method, information processing system, and computer program product manage computer executable instructions. At least one machine instruction for execution is received. The at least one machine instruction is analyzed. The machine instruction is identified as a predefined instruction for storing a variable length first operand in a memory location. Responsive to this identification and based on fields of the machine instruction, a relative location of a variable length second operand of the instruction with location of the first operand is determined. Responsive to the relative location having the predefined relationship, a first cracking operation is performed. The first cracking operation cracks the instruction into a first set of micro-ops (Uops) to be executed in parallel. The second set of Uops is for storing a first plurality of first blocks in the first operand. Each of said first block to be stored are identical. The first set Uops are executed. | 11-10-2011 |
20110307662 | MANAGING CACHE COHERENCY FOR SELF-MODIFYING CODE IN AN OUT-OF-ORDER EXECUTION SYSTEM - A method, system, and computer program product for managing cache coherency for self-modifying code in an out-of-order execution system are disclosed. A program-store-compare (PSC) tracking manager identifies a set of addresses of pending instructions in an address table that match an address requested to be invalidated by a cache invalidation request. The PSC tracking manager receives a fetch address register identifier associated with a fetch address register for the cache invalidation request. The fetch address register is associated with the set of addresses and is a PSC tracking resource reserved by a load store unit (LSU) to monitor an exclusive fetch for a cache line in a high level cache. The PSC tracking manager determines that the set of entries in an instruction line address table associated with the set of addresses is invalid and instructs the LSU to free the fetch address register. | 12-15-2011 |
20110307663 | Storage Unsharing - A method is described to partition the memory of application-specific hardware compiled from a software program. Applying the invention generates multiple small memories that need not be kept coherent and are defined over a specific region of the program. The invention creates application specific hardware which preserves the memory image and addressing model of the original software program. The memories are dynamically initialized and flushed at the entries and exits of the program region they are defined in. | 12-15-2011 |
20110320722 | MANAGEMENT OF MULTIPURPOSE COMMAND QUEUES IN A MULTILEVEL CACHE HIERARCHY - An apparatus for controlling access to a pipeline includes a plurality of command queues including a first subset of the plurality of command queues being assigned processes the commands of first command type, a second subset of the plurality of command queues being assigned to process commands of the second command type, and a third subset of the plurality of the command queues not being assigned to either the first subset or the second subset. The apparatus also includes an input controller configured to receive requests having the first command type and the second command type and assign requests having the first command type to command queues in the first subset until all command queues in the first subset are filled and then assign requests having the first command type to command queues in the third subset. | 12-29-2011 |
20110320723 | METHOD AND SYSTEM TO REDUCE THE POWER CONSUMPTION OF A MEMORY DEVICE - A method and system to reduce the power consumption of a memory device. In one embodiment of the invention, the memory device is a N-way set-associative level one (L1) cache memory and there is logic coupled with the data cache memory to facilitate access to only part of the N-ways of the N-way set-associative L1 cache memory in response to a load instruction or a store instruction. By reducing the number of ways to access the N-way set-associative L1 cache memory for each load or store request, the power requirements of the N-way set-associative L1 cache memory is reduced in one embodiment of the invention. In one embodiment of the invention, when a prediction is made that the accesses to cache memory only requires the data arrays of the N-way set-associative L1 cache memory, the access to the fill buffers are deactivated or disabled. | 12-29-2011 |
20110320724 | DMA-BASED ACCELERATION OF COMMAND PUSH BUFFER BETWEEN HOST AND TARGET DEVICES - Direct Memory Access (DMA) is used in connection with passing commands between a host device and a target device coupled via a push buffer. Commands passed to a push buffer by a host device may be accumulated by the host device prior to forwarding the commands to the push buffer, such that DMA may be used to collectively pass a block of commands to the push buffer. In addition, a host device may utilize DMA to pass command parameters for commands to a command buffer that is accessible by the target device but is separate from the push buffer, with the commands that are passed to the push buffer including pointers to the associated command parameters in the command buffer. | 12-29-2011 |
20120042128 | CACHE UNIT AND PROCESSING SYSTEM - According to one embodiment, a cache unit transferring data from a memory connected to the cache unit via a bus incompatible with a critical word first (CWF) to an L1-cache having a first line size and connected to the cache unit via a bus compatible with the CWF. The unit includes cache and un-cache controllers. The cache controller includes an L2-cache and a request converter. The L2-cache has a second line size greater than or equal to the first line size. The request converter converts a first refill request into a second refill request when a head address of a burst transfer of the first refill request is in the L2-cache. The un-cache controller transfers the second refill request to the memory, receives data to be processed corresponding to the second refill request from the memory, and transfers the received data to the L1-cache. | 02-16-2012 |
20120042129 | ARRANGEMENT METHOD OF PROGRAMS TO MEMORY SPACE, APPARATUS, AND RECORDING MEDIUM - For a program that is made up of functions in units, each function is divided into instruction code blocks having a size CS where CS is the instruction cache line size of a target processor and an instruction code block that is X | 02-16-2012 |
20120047329 | Reducing Cache Power Consumption For Sequential Accesses - In some embodiments, a cache may include a tag array and a data array, as well as circuitry that detects whether accesses to the cache are sequential (e.g., occupying the same cache line). For example, a cache may include a tag array and a data array that stores data, such as multiple bundles of instructions per cache line. During operation, it may be determined that successive cache requests are sequential and do not cross a cache line boundary. Responsively, various cache operations may be inhibited to conserve power. For example, access to the tag array and/or data array, or portions thereof, may be inhibited. | 02-23-2012 |
20120089783 | OPCODE LENGTH CACHING - A computer system caches variable-length instructions in a data structure. The computer system locates a first copy of an instruction in the cached data structure using a current value of the instruction pointer as a key. The computer system determines a predictive length of the instruction, and reads a portion of the instruction from an instruction memory as a second copy. The second copy has the predictive length. Based on the comparison of the first copy with the second copy, the computer system determines whether or not to read the rest of the instruction from the instruction memory, and then interprets the instruction for use by the computer system. | 04-12-2012 |
20120117327 | Bimodal Branch Predictor Encoded in a Branch Instruction - Each branch instruction having branch prediction support has branch prediction bits in architecture specified bit positions in the branch instruction. An instruction cache supports modifying the branch instructions with updated branch prediction bits that are dynamically determined when the branch instruction executes. | 05-10-2012 |
20120124292 | Computer System Having Cache Subsystem Performing Demote Requests - Computer system having cache subsystem wherein demote requests are performed by the cache subsystem. Software indicates to hardware of a processing system that its storage modification to a particular cache line is done, and will not be doing any modification for the time being. With this indication, the processor actively releases its exclusive ownership by updating its line ownership from exclusive to read-only (or shared) in its own cache directory and in the storage controller (SC). By actively giving up the exclusive rights, another processor can immediately be given exclusive ownership to that said cache line without waiting on any processor's explicit cross invalidate acknowledgement. This invention also describes the hardware design needed to provide this support. | 05-17-2012 |
20120137076 | Control of entry of program instructions to a fetch stage within a processing pipepline - A processing pipeline | 05-31-2012 |
20120137077 | MISS BUFFER FOR A MULTI-THREADED PROCESSOR - A multi-threaded processor configured to allocate entries in a buffer for instruction cache misses is disclosed. Entries in the buffer may store thread state information for a corresponding instruction cache miss for one of a plurality of threads executable by the processor. The buffer may include dedicated entries and dynamically allocable entries, where the dedicated entries are reserved for a subset of the plurality of threads and the dynamically allocable entries are allocable to a group of two or more of the plurality of threads. In one embodiment, the dedicated entries are dedicated for use by a single thread and the dynamically allocable entries are allocable to any of the plurality of threads. The buffer may store two or more entries for a given thread at a given time. In some embodiments, the buffer may help ensure none of the plurality of threads experiences starvation with respect to instruction fetches. | 05-31-2012 |
20120144119 | PROGRAMMABLE ATOMIC MEMORY USING STORED ATOMIC PROCEDURES - A processing core in a multi-processing core system is configured to execute a sequence of instructions as a single atomic memory transaction. The processing core validates that the sequence meets a set of one or more atomicity criteria, including that no instruction in the sequence instructs the processing core to access shared memory. After validating the sequence, the processing core executes the sequence as a single atomic memory transaction, such as by locking a source cache line that stores shared memory data, executing the validated sequence of instructions, storing a result of the sequence into the source cache line, and unlocking the source cache line. | 06-07-2012 |
20120144120 | PROGRAMMABLE ATOMIC MEMORY USING HARDWARE VALIDATION AGENT - A processing core in a multi-processing core system is configured to execute a sequence of instructions as an atomic memory transaction. Executing each instruction in the sequence comprises validating that the instruction meets a set of one or more atomicity criteria, including that executing the instruction does not require accessing shared memory. Executing the atomic memory transaction may comprise storing memory data from a source cache line into a target register, reading or modifying the memory data stored in the target register as part of executing the sequence, and storing a value from the target register to the source cache line. | 06-07-2012 |
20120151145 | Data Driven Micro-Scheduling of the Individual Processing Elements of a Wide Vector SIMD Processing Unit - A method for optimizing processing in a SIMD core. The method comprises processing units of data within a working domain, wherein the processing includes one or more working items executing in parallel within a persistent thread. The method further comprises retrieving a unit of data from within a working domain, processing the unit of data, retrieving other units of data when processing of the unit of data has finished, processing the other units of data, and terminating the execution of the working items when processing of the working domain has finished. | 06-14-2012 |
20120159076 | LATE-SELECT, ADDRESS-DEPENDENT SENSE AMPLIFIER - Sense amplifiers in a memory may be activated and deactivated. In one embodiment, a processor may include a memory. The memory may include a number of sense amplifiers. Based on a late arriving address bit of an address used to access data from the memory, a sense amplifier may be activated while another sense amplifier may be deactivated. | 06-21-2012 |
20120173821 | Predicting the Instruction Cache Way for High Power Applications - A mechanism for accessing a cache memory is provided. With the mechanism of the illustrative embodiments, a processor of the data processing system performs a first execution a portion of code. During the first execution of the portion of code, information identifying which cache lines in the cache memory are accessed during the execution of the portion of code is stored in a storage device of the data processing system. Subsequently, during a second execution of the portion of code, power to the cache memory is controlled such that only the cache lines that were accessed during the first execution of the portion of code are powered-up. | 07-05-2012 |
20120179873 | Performance of Emerging Applications in a Virtualized Environment Using Transient Instruction Streams - A method, system and computer-usable medium are disclosed for managing transient instruction streams. Transient flags are defined in Branch-and-Link (BRL) instructions that are known to be infrequently executed. A bit is likewise set in a Special Purpose Register (SPR) of the hardware (e.g., a core) that is executing an instruction request thread. Subsequent fetches or prefetches in the request thread are treated as transient and are not written to lower-level caches. If an instruction is non-transient, and if a lower-level cache is non-inclusive of the L1 instruction cache, a fetch or prefetch miss that is obtained from memory may be written in both the L1 and the lower-level cache. If it is not inclusive, a cast-out from the L1 instruction cache may be written in the lower-level cache. | 07-12-2012 |
20120198168 | VARIABLE CACHING STRUCTURE FOR MANAGING PHYSICAL STORAGE - A method for managing a variable caching structure for managing storage for a processor. The method includes using a multi-way tag array to store a plurality of pointers for a corresponding plurality of different size groups of physical storage of a storage stack, wherein the pointers indicate guest addresses that have corresponding converted native addresses stored within the storage stack, and allocating a group of storage blocks of the storage stack, wherein the size of the allocation is in accordance with a corresponding size of one of the plurality of different size groups. Upon a hit on the tag, a corresponding entry is accessed to retrieve a pointer that indicates where in the storage stack a corresponding group of storage blocks of converted native instructions reside. The converted native instructions are then fetched from the storage stack for execution. | 08-02-2012 |
20120198169 | Binary Rewriting in Software Instruction Cache - Mechanisms are provided for dynamically rewriting branch instructions in a portion of code. The mechanisms execute a branch instruction in the portion of code. The mechanisms determine if a target instruction of the branch instruction, to which the branch instruction branches, is present in an instruction cache associated with the processor. Moreover, the mechanisms directly branch execution of the portion of code to the target instruction in the instruction cache, without intervention from an instruction cache runtime system, in response to a determination that the target instruction is present in the instruction cache. In addition, the mechanisms redirect execution of the portion of code to the instruction cache runtime system in response to a determination that the target instruction cannot be determined to be present in the instruction cache. | 08-02-2012 |
20120198170 | Dynamically Rewriting Branch Instructions in Response to Cache Line Eviction - Mechanisms are provided for evicting cache lines from an instruction cache of the data processing system. The mechanisms store, for a portion of code in a current cache line, a linked list of call sites that directly or indirectly target the portion of code in the current cache line. A determination is made as to whether the current cache line is to be evicted from the instruction cache. The linked list of call sites is processed to identify one or more rewritten branch instructions having associated branch stubs, that either directly or indirectly target the portion of code in the current cache line. In addition, the one or more rewritten branch instructions are rewritten to restore the one or more rewritten branch instructions to an original state based on information in the associated branch stubs. | 08-02-2012 |
20120226868 | SYSTEMS AND METHODS FOR PROVIDING DETERMINISTIC EXECUTION - Devices and methods for providing deterministic execution of multithreaded applications are provided. In some embodiments, each thread is provided access to an isolated memory region, such as a private cache. In some embodiments, more than one private cache are synchronized via a modified MOESI coherence protocol. The modified coherence protocol may be configured to refrain from synchronizing the isolated memory regions until the end of an execution quantum. The execution quantum may end when all threads experience a quantum end event such as reaching a threshold instruction count, overflowing the isolated memory region, and/or attempting to access a lock released by a different thread in the same quantum. | 09-06-2012 |
20120246408 | ARITHMETIC PROCESSING DEVICE AND CONTROLLING METHOD THEREOF - A physical process ID (PPID) is stored for each cache block of each set, and a MAX WAY number for each PPID value is stored for each of index values # | 09-27-2012 |
20120246409 | ARITHMETIC PROCESSING UNIT AND ARITHMETIC PROCESSING METHOD - An arithmetic processing unit includes a cache memory, a register configured to hold data used for arithmetic processing, a correcting controller configured to detect an error in data retrieved from the register, a cache controller configured to access a cache area of a memory space via the cache memory or a noncache area of the memory space without using the cache memory in response to an instruction executing request for executing a requested instruction, and notify a report indicating that the requested instruction is a memory access instruction for accessing the noncache area, and an instruction executing controller configured to delay execution of other instructions subjected to error detection by the correcting controller while the cache controller executes the memory access instruction for accessing the noncache area when the instruction executing controller receives the notified report. | 09-27-2012 |
20120254542 | GATHER CACHE ARCHITECTURE - Apparatuses and methods to perform gather instructions are presented. In one embodiment, an apparatus comprises a gather logic module which includes a gather logic unit to identify locality of data elements in response to a gather instruction. The apparatus includes memory comprising a plurality of memory rows including a memory row associated with the gather instruction. The apparatus further includes memory structure to store data element addresses accessed in response to the gather instruction. | 10-04-2012 |
20120260042 | LOAD MULTIPLE AND STORE MULTIPLE INSTRUCTIONS IN A MICROPROCESSOR THAT EMULATES BANKED REGISTERS - A microprocessor supports an instruction set architecture that specifies: processor modes, architectural registers associated with each mode, and a load multiple instruction that instructs the microprocessor to load data from memory into specified ones of the registers. Direct storage holds data associated with a first portion of the registers and is coupled to an execution unit to provide the data thereto. Indirect storage holds data associated with a second portion of the registers and cannot directly provide the data to the execution unit. Which architectural registers are in the first and second portions varies dynamically based upon the current processor mode. If a specified register is currently in the first portion, the microprocessor loads data from memory into the direct storage, whereas if in the second portion, the microprocessor loads data from memory into the direct storage and then stores the data from the direct storage to the indirect storage. | 10-11-2012 |
20120272006 | DYNAMIC LOCKSTEP CACHE MEMORY REPLACEMENT LOGIC - To facilitate dynamic lockstep support, replacement states and/or logic used to select particular cache lines for replacement with new allocations in accord with replacement algorithms or strategies may be enhanced to provide generally independent replacement contexts for use in respective lockstep and performance modes. In some cases, replacement logic that may be otherwise conventional in its selection of cache lines for new allocations in accord with a first-in, first-out (FIFO), round-robin, random, least recently used (LRU), pseudo LRU, or other replacement algorithm/strategy is at least partially replicated to provide lockstep and performance instances that respectively cover lockstep and performance partitions of a cache. In some cases, a unified instance of replacement logic may be reinitialized with appropriate states at (or coincident with) transitions between performance and lockstep modes of operation. | 10-25-2012 |
20120284461 | Methods and Apparatus for Storage and Translation of Entropy Encoded Software Embedded within a Memory Hierarchy - A system for translating compressed instructions to instructions in an executable format is described. A translation unit is configured to decompress compressed instructions into a native instruction format using X and Y indices accessed from a memory, a translation memory, and a program specified mix mask. A level 1 cache is configured to store the native instruction format for each compressed instruction. The memory may be configured as a paged instruction cache to store pages of compressed instructions intermixed with pages of uncompressed instructions. Methods of determining a mix mask for efficiently translating compressed instructions is also described. A genetic method uses pairs of mix masks as genes from a seed population of mix masks that are bred and may be mutated to produce pairs of offspring mix masks to update the seed population. A mix mask for efficiently translating compressed instructions is determined from the updated seed population. | 11-08-2012 |
20120303900 | PROCESSING PIPELINE CONTROL - A graphics processing unit | 11-29-2012 |
20130019063 | STORAGE CONTROLLER CACHE PAGE MANAGEMENTAANM Astigarraga; TaraAACI TucsonAAST AZAACO USAAGP Astigarraga; Tara Tucson AZ USAANM Browne; Michael E.AACI StaatsburgAAST NYAACO USAAGP Browne; Michael E. Staatsburg NY USAANM Demczar; JosephAACI Salt PointAAST NYAACO USAAGP Demczar; Joseph Salt Point NY USAANM Wieder; Eric C.AACI New PaltzAAST NYAACO USAAGP Wieder; Eric C. New Paltz NY US - A cache page management method can include paging out a memory page to an input/output controller, paging the memory page from the input/output controller into a real memory, modifying the memory page in the real memory to an updated memory page and purging the memory page paged to the input/output controller. | 01-17-2013 |
20130054898 | SYSTEM AND METHOD FOR LOCKING DATA IN A CACHE MEMORY - A system and method for locking data in a cache memory. A first processing thread may be operated to run a program requesting data, where at least some of the requested data is loaded from a source memory into a non-empty cache. A second processing thread may be operated independently of the first processing thread to determine whether or not to lock the requested data in the cache. If the requested data is determined to be locked, the requested data may be locked in the cache at the same time as the data is loaded into the cache. | 02-28-2013 |
20130054899 | A 2-D GATHER INSTRUCTION AND A 2-D CACHE - A processor may support a two-dimensional (2-D) gather instruction and a 2-D cache. The processor may perform the 2-D gather instruction to access one or more sub-blocks of data from a two-dimensional (2-D) image stored in a memory coupled to the processor. The two-dimensional (2-D) cache may store the sub-blocks of data in a multiple cache lines. Further, the 2-D cache may support access of more than one cache lines while preserving a two-dimensional structure of the 2-D image. | 02-28-2013 |
20130080706 | PREVENTION OF CLASSLOADER MEMORY LEAKS IN MULTITIER ENTERPRISE APPLICATIONS - A classloader cache class definition is obtained by a processor. The classloader cache class definition includes code that creates a classloader object cache that is referenced by a strong internal reference by a classloader object in response to instantiation of the classloader cache class definition. A classloader object cache is instantiated using the obtained classloader cache class definition. The strong internal reference is created at instantiation of the classloader object cache. A public interface to the classloader object cache is provided. The public interface to the classloader object cache operates as a weak reference to the classloader object cache and provides external access to the classloader object cache. | 03-28-2013 |
20130080707 | PREVENTION OF CLASSLOADER MEMORY LEAKS IN MULTITIER ENTERPRISE APPLICATIONS - A classloader cache class definition is obtained by a processor. The classloader cache class definition includes code that creates a classloader object cache that is referenced by a strong internal reference by a classloader object in response to instantiation of the classloader cache class definition. A classloader object cache is instantiated using the obtained classloader cache class definition. The strong internal reference is created at instantiation of the classloader object cache. A public interface to the classloader object cache is provided. The public interface to the classloader object cache operates as a weak reference to the classloader object cache and provides external access to the classloader object cache. | 03-28-2013 |
20130086327 | AUTOMATIC CACHING OF PARTIAL RESULTS WHILE EDITING SOFTWARE - An automatic caching system is described herein that automatically determines user-relevant points at which to incrementally cache expensive to obtain data, resulting in faster computation of dependent results. The system can intelligently choose between caching data locally and pushing computation to a remote location collocated with the data, resulting in faster computation of results. The automatic caching system uses stable keys to uniquely refer to programmatic identifiers. The system annotates programs before execution with additional code that utilizes the keys to associate and cache intermediate programmatic results. The system can maintain the cache in a separate process or even on a separate machine to allow cached results to outlive program execution and allow subsequent execution to utilize previously computed results. Cost estimations are performed in order to choose whether utilizing cached values or remote execution would result in a faster computation of a result. | 04-04-2013 |
20130086328 | General Purpose Digital Data Processor, Systems and Methods - The invention provides improved data processing apparatus, systems and methods that include one or more nodes, e.g., processor modules or otherwise, that include or are otherwise coupled to cache, physical or other memory (e.g., attached flash drives or other mounted storage devices) collectively, “system memory.” At least one of the nodes includes a cache memory system that stores data (and/or instructions) recently accessed (and/or expected to be accessed) by the respective node, along with tags specifying addresses and statuses (e.g., modified, reference count, etc.) for the respective data (and/or instructions). The tags facilitate translating system addresses to physical addresses, e.g., for purposes of moving data (and/or instructions) between system memory (and, specifically, for example, physical memory—such as attached drives or other mounted storage) and the cache memory system. | 04-04-2013 |
20130111140 | CACHE MEMORY APPARATUS, CACHE CONTROL METHOD, AND MICROPROCESSOR SYSTEM | 05-02-2013 |
20130117510 | SYSTEM AND METHOD FOR MANAGING AN OBJECT CACHE - In order to optimize efficiency of deserialization, a serialization cache is maintained at an object server. The serialization cache is maintained in conjunction with an object cache and stores serialized forms of objects cached within the object cache. When an inbound request is received, a serialized object received in the request is compared to the serialization cache. If the serialized byte stream is present in the serialization cache, then the equivalent object is retrieved from the object cache, thereby avoiding deserialization of the received serialized object. If the serialized byte stream is not present in the serialization cache, then the serialized byte stream is deserialized, the deserialized object is cached in the object cache, and the serialized object is cached in the serialization cache. | 05-09-2013 |
20130138888 | STORING A TARGET ADDRESS OF A CONTROL TRANSFER INSTRUCTION IN AN INSTRUCTION FIELD - A control transfer instruction (CTI), such as a branch, jump, etc., may have an offset value for a control transfer that is to be performed. The offset value may be usable to compute a target address for the CTI (e.g., the address of a next instruction to be executed for a thread or instruction stream). The offset may be specified relative to a program counter. In response to detecting a specified offset value, the CTI may be modified to include at least a portion of a computed target address. Information indicating this modification has been performed may be stored, for example, in a pre-decode bit. In some cases, CTI modification may be performed only when a target address is a “near” target, rather than a “far” target. Modifying CTIs as described herein may eliminate redundant address calculations and produce a savings of power and/or time in some embodiments. | 05-30-2013 |
20130159628 | METHODS AND APPARATUS FOR SOURCE OPERAND COLLECTOR CACHING - Methods and apparatus for source operand collector caching. In one embodiment, a processor includes a register file that may be coupled to storage elements (i.e., an operand collector) that provide inputs to the datapath of the processor core for executing an instruction. In order to reduce bandwidth between the register file and the operand collector, operands may be cached and reused in subsequent instructions. A scheduling unit maintains a cache table for monitoring which register values are currently stored in the operand collector. The scheduling unit may also configure the operand collector to select the particular storage elements that are coupled to the inputs to the datapath for a given instruction. | 06-20-2013 |
20130238858 | DATA PROCESSING APPARATUS AND METHOD FOR PROVIDING TARGET ADDRESS INFORMATION FOR BRANCH INSTRUCTIONS - A data processing apparatus and method have a processor for executing instructions, and a prefetch unit for prefetching instructions from memory prior to sending those instructions to the processor for execution. A branch target cache structure has a plurality of entries, where the cache structure comprises an initial branch target cache having a first number of entries and a promoted entry branch target cache having a second number of entries. During lookup operation, both the initial entry branch target cache and the promoted entry branch target cache are accessed in parallel. For a branch instruction executed by the processor that does not currently have a corresponding entry in the branch target cache structure, allocation circuitry performs an initial allocation operation to allocate one of the entries in the initial entry branch target cache for storing the branch instruction information for that branch instruction. | 09-12-2013 |
20130238859 | CACHE WITH SCRATCH PAD MEMORY STRUCTURE AND PROCESSOR INCLUDING THE CACHE - Disclosed are a cache with a scratch pad memory (SPM) structure and a processor including the same. The cache with a scratch pad memory structure includes: a block memory configured to include at least one block area in which instruction codes read from an external memory are stored; a tag memory configured to store an external memory address corresponding to indexes of the instruction codes stored in the block memory; and a tag controller configured to process a request from a fetch unit for the instruction codes, wherein a part of the block areas is set as a SPM area according to cache setting input from a cache setting unit. According to the present invention, it is possible to reduce the time to read instruction codes from the external memory and realize power saving by operating the cache as the scratch pad memory. | 09-12-2013 |
20130246709 | TRANSLATION ADDRESS CACHE FOR A MICROPROCESSOR - Embodiments related to fetching instructions and alternate versions achieving the same functionality as the instructions from an instruction cache included in a microprocessor are provided. In one example, a method is provided, comprising, at an example microprocessor, fetching an instruction from an instruction cache. The example method also includes hashing an address for the instruction to determine whether an alternate version of the instruction which achieves the same functionality as the instruction exists. The example method further includes, if hashing results in a determination that such an alternate version exists, aborting fetching of the instruction and retrieving and executing the alternate version. | 09-19-2013 |
20130262771 | INDICATING A LENGTH OF AN INSTRUCTION OF A VARIABLE LENGTH INSTRUCTION SET - Some implementations disclosed herein provide techniques and arrangements for indicating a length of an instruction from an instruction set that has variable length instructions. A plurality of bytes that include an instruction may be read from an instruction cache based on a logical instruction pointer. A determination is made whether a first byte of the plurality of bytes identifies a length of the instruction. In response to detecting that the first byte of the plurality of bytes identifies the length of the instruction, the instruction is read from the plurality of bytes based on the length of the instruction. | 10-03-2013 |
20130290639 | APPARATUS AND METHOD FOR MEMORY COPY AT A PROCESSOR - A processor uses a dedicated buffer to reduce the amount of time needed to execute memory copy operations. For each load instruction associated with the memory copy operation, the processor copies the load data from memory to the dedicated buffer. For each store operation associated with the memory copy operation, the processor retrieves the store data from the dedicated buffer and transfers it to memory. The dedicated buffer is separate from a register file and caches of the processor, so that each load operation associated with a memory copy operation does not have to wait for data to be loaded from memory to the register file. Similarly, each store operation associated with a memory copy operation does not have to wait for data to be transferred from the register file to memory. | 10-31-2013 |
20130290640 | BRANCH PREDICTION POWER REDUCTION - In one embodiment, a microprocessor is provided. The microprocessor includes instruction memory and a branch prediction unit. The branch prediction unit is configured to use information from the instruction memory to selectively power up the branch prediction unit from a powered-down state when fetched instruction data includes a branch instruction and maintain the branch prediction unit in the powered-down state when the fetched instruction data does not include a branch instruction in order to reduce power consumption of the microprocessor during instruction fetch operations. | 10-31-2013 |
20130304993 | Method and Apparatus for Tracking Extra Data Permissions in an Instruction Cache - Systems and methods are disclosed for maintaining an instruction cache including extended cache lines and page attributes for main cache line portions of the extended cache lines and, at least for one or more predefined potential page-crossing instruction locations, additional page attributes for extra data portions of the corresponding extended cache lines. In addition, systems and methods are disclosed for processing page-crossing instructions fetched from an instruction cache having extended cache lines. | 11-14-2013 |
20130311723 | CACHE MEMORY APPARATUS - A cache memory apparatus includes an L1 cache memory, an L2 cache memory coupled to the L1 cache memory, an arithmetic logic unit (ALU) within the L2 cache memory, the combined ALU and L2 cache memory being configured to perform therewithin at least one of: an arithmetic operation, a logical bit mask operation; the cache memory apparatus being further configured to interact with at least one processor such that atomic memory operations bypass the L1 cache memory and go directly to the L2 cache memory. | 11-21-2013 |
20130339611 | HIGH-PERFORMANCE CACHE SYSTEM AND METHOD - A digital system is provided for high-performance cache systems. The digital system includes a processor core and a cache control unit. The processor core is capable of being coupled to a first memory containing executable instructions and a second memory with a faster speed than the first memory. Further, the processor core is configured to execute one or more instructions of the executable instructions from the second memory. The cache control unit is configured to be couple to the first memory, the second memory, and the processor core to fill at least the one or more instructions from the first memory to the second memory before the processor core executes the one or more instructions. Further, the cache control unit is also configured to examine instructions being filled from the first memory to the second memory to extract instruction information containing at least branch information, to create a plurality of tracks based on the extracted instruction information; and to fill the at least one or more instructions based on one or more tracks from the plurality of instruction tracks. | 12-19-2013 |
20130346698 | DATA PROCESSING APPARATUS AND METHOD FOR REDUCING STORAGE REQUIREMENTS FOR TEMPORARY STORAGE OF DATA - A data processing apparatus and method, the apparatus including processing circuitry for executing a sequence of instructions, each instruction having an associated memory address and the sequence of instructions including cacheable instructions whose associated memory addresses are within a cacheable memory region. Instruction cache control circuitry is arranged to store within a selected cache line of a data storage the instruction data for a plurality of cacheable instructions as retrieved from memory, to store within the tag entry associated with that selected cache line the address identifier for that stored instruction data, and to identify that selected cache line as valid within the valid flag storage. Control state circuitry maintains a record of the chosen cache line in which said data of a predetermined data type has been written, so that upon receipt of a request for that data it can then be provided from the instruction cache. | 12-26-2013 |
20140025893 | CONTROL FLOW MANAGEMENT FOR EXECUTION OF DYNAMICALLY TRANSLATED NON-NATIVE CODE IN A VIRTUAL HOSTING ENVIRONMENT - Execution of non-native operating system images within a virtualized computer system is improved by providing a mechanism for retrieving translated code physical addresses corresponding to un-translated code branch target addresses using a host code map. Hardware acceleration mechanisms, such as content-accessible look-up tables, directory hardware, or processor instructions that operate on tables in memory can be provided to accelerate the performance of the translation mechanism. The virtual address of the branch instruction target is used as a key to look up a corresponding record that contains a physical address of the translated code page containing the translated branch instruction target, and execution is directed to the physical address obtained from the record, once the physical page containing the translated code corresponding the target address is loaded in memory. | 01-23-2014 |
20140025894 | PROCESSOR USING BRANCH INSTRUCTION EXECUTION CACHE AND METHOD OF OPERATING THE SAME - A processor using a branch instruction execution cache and a method of operating the same are disclosed. The processor according to an example embodiment of the present invention includes a fetch unit, a branch prediction unit, an instruction queue, a decoding unit and an execution unit operating in a pipeline manner, and includes a branch instruction execution cache that stores address and decode information of a transferred instruction output from the decoding unit, and provides the stored address and at least some of pieces of the decode information to the execution unit in order to overcome branch misprediction when the execution unit determines the branch misprediction. Therefore, with the processor according to an example embodiment of the present invention, overhead of pipeline initialization can be minimized to prevent performance degradation of the processor and reduce power consumption of the processor. | 01-23-2014 |
20140025895 | MULTIPLE-CACHE PARALLEL REDUCTION AND APPLICATIONS - In accordance with an aspect of the present invention, a method and system for parallel computing is provided, that reduces the time necessary for the execution of program function. In order to reduce the time needed to execute aspects of a program, multiple program threads are executed simultaneously, while thread 0 of the program is also executed. These threads are executed simultaneously with the aid of at least one cache of the computing device on which the program is being run. Such a framework reduces wasted computing power and the time necessary to execute aspects of a program. | 01-23-2014 |
20140032847 | SEMICONDUCTOR DEVICE - A semiconductor device which can hold an instruction configuring a loop in a cache memory is provided. A main memory stores an instruction. A cache memory stores the instruction temporarily. A CPU reads an instruction from the main memory or the cache memory and executes the instruction. The cache control unit stops updating an instruction stored in the cache memory, when the same instruction as the instruction read in the past is again read within a prescribed number of read. | 01-30-2014 |
20140047186 | TRANSACTIONAL MEMORY SYSTEM WITH EFFICIENT CACHE SUPPORT - Embodiments related to a transaction program. An aspect includes, based on determining that one instruction is part of an active atomic instruction group (AIG), determining whether a private-to-transaction (PTRAN) bit associated with an address of the one instruction in a main memory is set, the PTRAN bit being located in a main memory comprising a plurality of memory increments each having a respective directly addressable PTRAN bit in the main memory. Another aspect includes, based on determining that the PTRAN bit is not set: setting the PTRAN bit; adding a new entry to a cache structure and a transaction table including an old data state of the address of the one instruction stored in the cache structure and control information stored in the transaction table; and completing the one instruction as part of the active AIG. | 02-13-2014 |
20140052921 | STORE-EXCLUSIVE INSTRUCTION CONFLICT RESOLUTION - A data processing system includes a plurality of transaction masters ( | 02-20-2014 |
20140089588 | Method and system of storing and retrieving data - Method and system of storing data by a software application. Each read query of a data storage system by a software application is first solely issued to a plurality of cache nodes, which returns the queried data if available. If not available, the software application receives a miss that triggers a fetch of the queried data from one or more database systems on a first dedicated interface. Upon having retrieved the queried data, the software application adds the queried data to at least one cache node on a second dedicated. Each writing of the one or more database systems by the software application is also concurrently performed in the at least one cache node. Hence, population of the at least one cache node is quickly done at each missed read query of the at least one cache node and at each write query of the data storage system. | 03-27-2014 |
20140089589 | BARRIER COLORS - Methods and processors for enforcing an order of memory access requests in the presence of barriers in an out-of-order processor pipeline. A speculative color is assigned to instruction operations in the front-end of the processor pipeline, while the instruction operations are still in order. The instruction operations are placed in any of multiple reservation stations and then issued out-of-order from the reservation stations. When a barrier is encountered in the front-end, the speculative color is changed, and instruction operations are assigned the new speculative color. A core interface unit maintains an architectural color, and the architectural color is changed when a barrier retires. The core interface unit stalls instruction operations with a speculative color that does match the architectural color. | 03-27-2014 |
20140115257 | PREFETCHING USING BRANCH INFORMATION FROM AN INSTRUCTION CACHE - A processor stores branch information at a “sparse” cache and a “dense” cache. The sparse cache stores the target addresses for up to a specified number of branch instructions in a given cache entry associated with a cache line address, while branch information for additional branch instructions at the cache entry is stored at the dense cache. Branch information at the dense cache persists after eviction of the corresponding cache line until it is replaced by branch information for a different cache entry. Accordingly, in response to the instructions for a given cache line address being requested for retrieval from memory, a prefetcher determines whether the dense cache stores branch information for the cache line address. If so, the prefetcher prefetches the instructions identified by the target addresses of the branch information in the dense cache concurrently with transferring the instructions associated with the cache line address. | 04-24-2014 |
20140136786 | ASYNCHRONOUS PERSISTENT STORES FOR TRANSACTIONS - A processor includes a processor core, a cache, and a tracker. The processor core is configured to execute persistent write instructions and receive notifications of completed persistent write instructions. The tracker is configured to track the completion state of a persistent write instruction. | 05-15-2014 |
20140173209 | Presenting Enclosure Cache As Local Cache In An Enclosure Attached Server - Presenting enclosure cache as local cache in an enclosure attached server, including: determining, by the enclosure, a cache hit rate for local server cache in each of a plurality of enclosure attached servers; determining, by the enclosure, an amount of available enclosure cache for use by one or more of the enclosure attached servers; and offering, by the enclosure, some portion of the available enclosure cache to an enclosure attached server in dependence upon the cache hit rate and the amount of available enclosure cache. | 06-19-2014 |
20140181405 | INSTRUCTION CACHE HAVING A MULTI-BIT WAY PREDICTION MASK - In a particular embodiment, an apparatus includes control logic configured to selectively set bits of a multi-bit way prediction mask based on a prediction mask value. The control logic is associated with an instruction cache including a data array. A subset of line drivers of the data array is enabled responsive to the multi-bit way prediction mask. The subset of line drivers includes multiple line drivers. | 06-26-2014 |
20140189242 | LOGGING IN SECURE ENCLAVES - Embodiments of an invention for logging in secure enclaves are disclosed. In one embodiment, a processor includes an instruction unit and an execution unit. The instruction unit is to receive an instruction having an associated enclave page cache address. The execution unit is to execute the instruction without causing a virtual machine exit, wherein execution of the instruction includes logging the instruction and the associated enclave page cache address. | 07-03-2014 |
20140201449 | DATA CACHE WAY PREDICTION - In a particular embodiment, a method, includes identifying one or more way prediction characteristics of an instruction. The method also includes selectively reading, based on identification of the one or more way prediction characteristics, a table to identify an entry of the table associated with the instruction that identifies a way of a data cache. The method further includes making a prediction whether a next access of the data cache based, on the instruction will access the way. | 07-17-2014 |
20140201450 | Optimized Matrix and Vector Operations In Instruction Limited Algorithms That Perform EOS Calculations - There is provided a system and method for optimizing matrix and vector calculations in instruction limited algorithms that perform EOS calculations. The method includes dividing each matrix associated with an EOS stability equation and an EOS phase split equation into a number of tiles, wherein the tile size is heterogeneous or homogenous. Each vector associated with the EOS stability equation and the EOS phase split equation may be divided into a number of strips. The tiles and strips may be stored in main memory, cache, or registers, and the matrix and vector operations associated with successive substitutions and Newton iterations may be performed in parallel using the tiles and strips. | 07-17-2014 |
20140208033 | Split-Word Memory - Method, process, and apparatus to efficiently store, read, and/or process syllables of word data. A portion of a data word, which includes multiple syllables, may be read by a computer processor. The processor may read a first syllable of the data word from a first memory. The processor may read a second syllable of the data word from a second portion of memory. The second syllable may include bits which are less critical than the bits of the first syllable. The second memory may be distinct from the first memory based on one or more physical attributes. | 07-24-2014 |
20140223101 | REGISTER FILE HAVING A PLURALITY OF SUB-REGISTER FILES - Register files for use in an out-of-order processor that have been divided into a plurality of sub-register files. The register files also have a plurality of buffers which are each associated with one of the sub-register files. Each buffer receives and stores write operations destined for the associated sub-register file which can be later issued to the sub-register file. Specifically, each clock cycle it is determined whether there is at least one write operation in the buffer that has not been issued to the associated sub-register file. If there is at least one write operation in the buffer that has not been issued to the associated sub-register file, one of the non-issued write operations is issued to the associated sub-register file. Each sub-register file may also have an arbitration logic unit which resolves conflicts between read and write operations that want to access the associated sub-register file in the same cycle by prioritizing read operations unless a conflicting write instruction has reached commit time. | 08-07-2014 |
20140244933 | Way Lookahead - Methods and systems that identify and power up ways for future instructions are provided. A processor includes an n-way set associative cache and an instruction fetch unit. The n-way set associative cache is configured to store instructions. The instruction fetch unit is in communication with the n-way set associative cache and is configured to power up a first way, where a first indication is associated with an instruction and indicates the way where a future instruction is located and where the future instruction is two or more instructions ahead of the current instruction. | 08-28-2014 |
20140258624 | Apparatus and Method for Operating a Processor with an Operation Cache - A processor includes a computation engine to produce a computed value for a set of operands. A cache stores the set of operands and the computed value. The cache is configured to selectively identify a match and a miss for a new set of operands. In the event of a match the computed value is supplied by the cache and a computation engine operation is aborted. In the event of a miss a new computed value for the new set of operands is computed by the computation engine and is stored in the cache. | 09-11-2014 |
20140258625 | Data processing method and apparatus - Embodiments of the present invention provide a data processing method and apparatus. According to the embodiments of the present invention, when it is found that a data hash value in a currently received data stream exceeds a preset first threshold, a part or all of data in the data stream is not deduplicated, and is directly stored, so as to prevent the data in the data stream from being dispersedly stored into a plurality of storage areas; instead, the part or all of the data is stored into a storage area in a centralized manner, so that a deduplication rate is effectively improved on the whole, particularly in a scenario of large data storage amount. | 09-11-2014 |
20140258626 | ELECTRONIC DEVICES HAVING SEMICONDUCTOR MEMORY UNIT - An electronic device includes: a variable resistance element having a first electrode, a variable resistance layer, and a second electrode which are sequentially stacked therein; a spacer formed on the sidewall of the variable resistance element; and a conductive line covering the variable resistance element including the spacer. | 09-11-2014 |
20140258627 | MIGRATION OF DATA TO REGISTER FILE CACHE - Methods and migration units for use in out-of-order processors for migrating data to register file caches associated with functional units of the processor to satisfy register read operations. The migration unit receives register read operations to be executed for a particular functional unit. The migration unit reviews entries in a register renaming table to determine if the particular functional unit has recently accessed the source register and thus is likely to comprise an entry for the source register in its register file cache. In particular, the register renaming table comprises entries for physical registers that indicate what functional units have accessed the physical register. If the particular functional unit has not accessed the particular physical register the migration unit migrates data to the register file cache associated with the particular functional unit. | 09-11-2014 |
20140281246 | INSTRUCTION BOUNDARY PREDICTION FOR VARIABLE LENGTH INSTRUCTION SET - A system, processor, and method to predict with high accuracy and retain instruction boundaries for previously executed instructions in order to decode variable length instructions is disclosed. In at least one embodiment, a disclosed processor includes an instruction fetch unit, an instruction cache, a boundary byte predictor, and an instruction decoder. In some embodiments, the instruction fetch unit provides an instruction address and the instruction cache produces an instruction tag and instruction cache content corresponding to the instruction address. The instruction decoder, in some embodiments, includes boundary byte logic to determine an instruction boundary in the instruction cache content. | 09-18-2014 |
20140297958 | SYSTEM AND METHOD FOR UPDATING AN INSTRUCTION CACHE FOLLOWING A BRANCH INSTRUCTION IN A SEMICONDUCTOR DEVICE - A semiconductor device includes a memory for storing a plurality of instructions therein, an instruction queue which temporarily stores the instructions fetched from the memory therein, a central processing unit which executes the instruction supplied from the instruction queue, an instruction cache which stores therein the instructions executed in the past by the central processing unit, and a control circuit which controls fetching of each instruction. When the central processing unit executes a branch instruction, and an instruction of a branch destination is being in the instruction cache and an instruction following the instruction of the branch destination is stored in the instruction queue, the control circuit causes the instruction queue to fetch the instruction of the branch destination from the instruction cache and causes the instruction queue not to fetch the instruction following the instruction of the branch destination. | 10-02-2014 |
20140317352 | MEMORY OBJECT REFERENCE COUNT MANAGEMENT WITH IMPROVED SCALABILITY - Generally, this disclosure provides systems, devices, methods and computer readable media for memory object reference count management with improved scalability based on transactional reference count elision. The device may include a hardware transactional memory processor configured to maintain a read-set associated with a transaction and to abort the transaction in response to a modification of contents of the read-set by an entity external to the transaction; and a code module configured to: enter the transaction; locate the memory object; read the reference count associated with the memory object, such that the reference count is added to the read-set associated with the transaction; access the memory object; and commit the transaction. | 10-23-2014 |
20140337582 | HIGH-PERFORMANCE CACHE SYSTEM AND METHOD - A digital system is provided for high-performance cache systems. The digital system includes a processor core and a cache control unit. The processor core is capable of being coupled to a first memory containing executable instructions and a second memory with a faster speed than the first memory. Further, the processor core is configured to execute one or more instructions of the executable instructions from the second memory. The cache control unit is configured to be couple to the first memory, the second memory, and the processor core to fill at least the one or more instructions from the first memory to the second memory before the processor core executes the one or more instructions. Further, the cache control unit is also configured to examine instructions being filled from the first memory to the second memory to extract instruction information containing at least branch information, to create a plurality of tracks based on the extracted instruction information; and to fill the at least one or more instructions based on one or more tracks from the plurality of instruction tracks. | 11-13-2014 |
20140351521 | STORAGE SYSTEM AND METHOD FOR CONTROLLING STORAGE SYSTEM - A cache package (for example, a flash memory package configured by flash memories) can execute a cache control process instead of a processor in a storage system by a request of the cache control process from the storage system. Consequently, time for the process that the processor of the storage system executes can be reduced and increase in a throughput can be achieved. For example, particularly the present invention is effective in real time data processing in OLTP (OnLine Transaction Processing) (for example, database processes in finance, medical service, Internet service, and government and public service). In addition, under the concept of recent EPR (Enterprise Resource Planning) a flexible storage system that can respond rapid fluctuation in an amount of data and an access load can be established and leveraged by increasing several boards of required cache packages. | 11-27-2014 |
20150012708 | Parallel, pipelined, integrated-circuit implementation of a computational engine - Embodiments of the present invention are directed to parallel, pipelined, integrated-circuit implementations of computational engines designed to solve complex computational problems. One embodiment of the present invention is a family of video encoders and decoders (“codecs”) that can be incorporated within cameras, cell phones, and other electronic devices for encoding raw video signals into compressed video signals for storage and transmission, and for decoding compressed video signals into raw video signals for output to display devices. A highly parallel, pipelined, special-purpose integrated-circuit implementation of a particular video codec provides, according to embodiments of the present invention, a cost-effective video-codec computational engine that provides an extremely large computational bandwidth with relatively low power consumption and low-latency for decompression and compression of compressed video signals and raw video signals, respectively. | 01-08-2015 |
20150019814 | Extract Target Cache Attribute Facility and Instruction Therefore - A facility and cache machine instruction of a computer architecture for specifying a target cache cache-level and a target cache attribute of interest for obtaining a cache attribute of one or more target caches. The requested cache attribute of the target cache(s) is saved in a register. | 01-15-2015 |
20150046655 | DATA PROCESSING SYSTEMS - A data processing system includes one or more processors | 02-12-2015 |
20150052305 | ARITHMETIC PROCESSING DEVICE, ARITHMETIC PROCESSING METHOD AND ARITHMETIC PROCESSING SYSTEM - An arithmetic processing device includes: a cache memory configured to store data; and a circuitry configured to: execute access instructions including a first access instruction and a second access instruction; and request, in a case where a first access to the cache memory based on the first access instruction has been completed and the first access instruction is a serializing instruction, a re-execution of the second access instruction subsequent to the serializing instruction when a second access to the cache memory based on the second instruction has been completed. | 02-19-2015 |
20150052306 | PROCESSOR AND CONTROL METHOD OF PROCESSOR - Lock information indicating that an address is locked and a lock address are held for each thread, and in a case where the execution of a CAS instruction is requested, a primary cache controller which receives a request from an instruction controlling unit which requests processing according to an instruction in each thread executes a plurality of pieces of processing included in the CAS instruction when an access target address of the CAS instruction is different from the lock address of a thread whose lock information is held, and prohibits the execution of store processing of a thread whose lock information is not held, to a cache memory when the lock information of any thread out of the plural threads is held. | 02-19-2015 |
20150052307 | PROCESSOR AND CONTROL METHOD OF PROCESSOR - When a primary cache controller of a core unit arbitrates and issues a non-cache write request in a thread “0” (zero) and a non-cache read request in a thread 1 from an instruction controller, and when the non-cache requests being arbitration objects are in issuable states by obtaining a response for a preceding non-cache write request after an issuance of the preceding non-cache write request in the thread-0 (zero) which precedes to the non-cache write request in the thread-0 (zero) being the arbitration object, the non-cache read request in the thread-1 is given priority to be issued so that the non-cache read request whose priority is low is not continued to be waited. | 02-19-2015 |
20150052308 | PRIORITIZED CONFLICT HANDLING IN A SYSTEM - A controller has a cache to store data associated with an address that is subject to conflict resolution. A conflict resolution queue stores information relating to plural transactions, and logic reprioritizes the plural transactions in the conflict resolution queue to change a priority of a first type of transaction with respect to a priority of second type of transaction. | 02-19-2015 |
20150058571 | HINT VALUES FOR USE WITH AN OPERAND CACHE - Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Hint values may be used in some embodiments to suggest that a particular operand should be stored in the operand cache (so that is available for current or future use). In one embodiment, a hint value indicates that an operand should be cached whenever possible. Hint values may be determined by software, such as a compiler, in some embodiments. One or more criteria may be used to determine hint values, such as how soon in the future or how frequently an operand will be used again. | 02-26-2015 |
20150058572 | INTELLIGENT CACHING FOR AN OPERAND CACHE - Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may store a subset of operands, and may use less power and have quicker access times than the register file. In some embodiments, intelligent operand prefetching may speed execution by reducing memory bank conflicts (e.g., conflicts within a register file containing multiple memory banks). An unused operand slot for another instruction (e.g., an instruction that does not require a maximum number of source operands allowed by an instruction set architecture) may be used to prefetch an operand for another instruction in one embodiment. Prefetched operands may be stored in an operand cache, and prefetching may occur based on software-provided information. | 02-26-2015 |
20150058573 | OPERAND CACHE DESIGN - Instructions may require one or more operands to be executed, which may be provided from a register file. In the context of a GPU, however, a register file may be a relatively large structure, and reading from the register file may be energy and/or time intensive An operand cache may be used to store a subset of operands, and may use less power and have quicker access times than the register file. Selectors (e.g., multiplexers) may be used to read operands from the operand cache. Power savings may be achieved in some embodiments by activating only a subset of the selectors, which may be done by activators (e.g. flip-flops). Operands may also be concurrently provided to two or more locations via forwarding, which may be accomplished via a source selection unit in some embodiments. Operand forwarding may also reduce power and/or speed execution in one or more embodiments. | 02-26-2015 |
20150058574 | Increasing The Efficiency of Memory Resources In a Processor - Methods of increasing the efficiency of memory resources within a processor are described. In an embodiment, instead of including dedicated DSP indirect register resource for storing data associated with DSP instructions, this data is stored in an allocated and locked region within the cache. The state of any cache lines which are used to store DSP data is then set to prevent the data from being written to memory. The size of the allocated region within the cache may vary according to the amount of DSP data that needs to be stored and when no DSP instructions are being run, no cache resources are allocated for storage of DSP data. | 02-26-2015 |
20150067260 | OPTIMIZING MEMORY BANDWIDTH CONSUMPTION USING DATA SPLITTING WITH SOFTWARE CACHING - A computer processor collects information for a dominant data access loop and reference code patterns based on data reference pattern analysis, and for pointer aliasing and data shape based on pointer escape analysis. The computer processor selects a candidate array for data splitting wherein the candidate array is referenced by a dominant data access loop. The computer processor determines a data splitting mode by which to split the data of the candidate array, based on the reference code patterns, the pointer aliasing, and the data shape information, and splits the data into two or more split arrays. The computer processor creates a software cache that includes a portion of the data of the two or more split arrays in a transposed format, and maintains the portion of the transposed data within the software cache and consults the software cache during an access of the split arrays. | 03-05-2015 |
20150067261 | Device and Method for Eliminating Complex Operations in Processing Systems based on Caching - The technology described in this application relates generally to computing processing systems and more specifically relates to systems that process data with resource intensive operations. Method and apparatus to lower the power consumption of the resource intensive operations are disclosed. Code analysis methods and run-time apparatus are presented that may eliminate the redundant operations (either complex calculations, memory fetches, or both). The techniques presented in this application are driven by special instructions inserted in the software code of the executing computer programs during the code generation process. Code analysis methods to insert the special instructions into the appropriate points in the source code of the target executing computer programs are presented. Run-time hardware mechanisms to support the potential elimination of redundant operations are also presented. Corresponding methods that might increase the number of eliminated operations by allowing limited errors to occur are also disclosed. | 03-05-2015 |
20150074353 | System and Method for an Asynchronous Processor with Multiple Threading - Embodiments are provided for an asynchronous processor with multiple threading. The asynchronous processor includes a program counter (PC) logic and instruction cache unit comprising a plurality of PC logics configured to perform branch prediction and loop predication for a plurality of threads of instructions, and determine target PC addresses for caching the plurality of threads. The processor further comprises an instruction memory configured to cache the plurality of threads in accordance with the target PC addresses from the PC logic and instruction cache unit. The processor further includes a multi-threading (MT) scheduling unit configured to schedule and merge instruction flows for the plurality of threads from the instruction memory into a single combined thread of instructions. Additionally, a MT register window register is included to map operands in the plurality of threads to a plurality of corresponding register windows in a register file. | 03-12-2015 |
20150074354 | DEVICE, SYSTEM AND METHOD FOR USING A MASK REGISTER TO TRACK PROGRESS OF GATHERING ELEMENTS FROM MEMORY - A device, system and method for assigning values to elements in a first register, where each data field in a first register corresponds to a data element to be written into a second register, and where for each data field in the first register, a first value may indicate that the corresponding data element has not been written into the second register and a second value indicates that the corresponding data element has been written into the second register, reading the values of each of the data fields in the first register, and for each data field in the first register having the first value, gathering the corresponding data element and writing the corresponding data element into the second register, and changing the value of the data field in the first register from the first value to the second value. Other embodiments are described and claimed. | 03-12-2015 |
20150089141 | MICROPROCESSOR AND METHOD FOR USING AN INSTRUCTION LOOP CACHE THEREOF - A microprocessor is provided, which includes a processor core and an instruction loop cache. The processor core provides a fetch address of an instruction stream. The fetch address includes a tag and an index. The instruction loop cache receives the fetch address from the processor core. The instruction loop cache includes a cache array and a tag storage. The cache array stores multiple cache entries. Each cache entry includes a tag identification (ID). The cache array outputs the tag ID of the cache entry indicated by the index of the fetch address. The tag storage stores multiple tag values and output the tag value indicated by the tag ID output by the cache array. The instruction loop cache determines whether a cache hit or a cache miss occurs based on a bitwise comparison between the tag of the fetch address and the tag value output by the tag storage. | 03-26-2015 |
20150089142 | MICROPROCESSOR WITH INTEGRATED NOP SLIDE DETECTOR - A microprocessor includes an instruction cache and a hardware state machine configured to detect a continuous sequence of N no operation (NOP) instructions within a stream of instruction bytes fetched from the instruction cache, wherein N is greater than zero. The microprocessor is configured to suspend fetching and executing instructions from the instruction cache in response to detecting the continuous sequence of N NOP instructions. | 03-26-2015 |
20150095578 | INSTRUCTIONS AND LOGIC TO PROVIDE MEMORY FENCE AND STORE FUNCTIONALITY - Instructions and logic provide memory fence and store functionality. Some embodiments include a processor having a cache to store cache coherent data in cache lines for one or more memory addresses of a primary storage. A decode stage of the processor decodes an instruction specifying a source data operand, one or more memory addresses as destination operands, and a memory fence type. Responsive to the decoded instruction, one or more execution units may enforce the memory fence type, then store data from the source data operand to the one or more memory addresses, and ensure that the stored data has been committed to primary storage. For some embodiments, the primary storage may comprise persistent memory. For some embodiments, cache lines corresponding to the memory addresses may be flushed, or marked for persistent write back to primary storage. Alternatively the cache may be bypassed, e.g. by performing a streaming vector store. | 04-02-2015 |
20150100734 | SEMAPHORE METHOD AND SYSTEM WITH OUT OF ORDER LOADS IN A MEMORY CONSISTENCY MODEL THAT CONSTITUTES LOADS READING FROM MEMORY IN ORDER - In a processor, a method for using a semaphore with out of order loads in a memory consistency model that constitutes loads reading from memory in order. The method includes implementing a memory resource that can be accessed by a plurality of cores; implementing an access mask that functions by tracking which words of a cache line have pending loads, wherein the cache line includes the memory resource, wherein an out of order load sets a mask bit within the access mask when accessing a word of the cache line, and clears the mask bit when that out of order load retires. The method further includes checking the access mask upon execution of subsequent stores from the plurality of cores to the cache line; and causing a miss prediction when a subsequent store to the portion of the cache line sees a prior mark from a load in the access mask, wherein the subsequent store will signal a load queue entry corresponding to that load by using a tracker register. | 04-09-2015 |
20150113221 | HYBRID INPUT/OUTPUT WRITE OPERATIONS - A first processor receives a write request from an input/output (I/O) device connected to the first processor. The first processor determines whether the write request satisfies an allocating write criterion. Responsive to determining that the write request satisfies the allocating write criterion, the first processor writes data associated with the write request to a cache of the first processor. | 04-23-2015 |
20150121010 | UNIFIED STORE QUEUE - Embodiments herein provide for improved store-to-load-forwarding (STLF) logic and linear aliasing effect reduction logic. In one embodiment, a load instruction to be executed is selected. Whether a first linear address associated with said load instruction matches a linear address of a store instruction of a plurality of store instructions in a queue is determined. Data associated with said store instruction for executing said load instruction is forwarded, in response to determining that the first linear address matches the linear address of the store instruction. | 04-30-2015 |
20150127912 | CACHE PARTITIONING IN A MULTICORE PROCESSOR - Techniques described herein generally include methods and systems related to cache partitioning in a chip multiprocessor. Cache-partitioning for a single thread or application between multiple data sources improves energy or latency efficiency of a chip multiprocessor by exploiting variations in energy cost and latency cost of the multiple data sources. Partition sizes for each data source may be selected using an optimization algorithm that minimizes or otherwise reduces latencies or energy consumption associated with cache misses. | 05-07-2015 |
20150143049 | CACHE CONTROL APPARATUS AND METHOD - Provided is a cache control apparatus and method that, when a plurality of processors read a program from the same memory in a chip, maintain coherency of data and an instruction generated by a cache memory. The cache control apparatus includes a coherency controller client configured to include an MESI register, which is included in an instruction cache, and stores at least one of a modified state, an exclusive state, a shared state, and an invalid state for each line of the instruction cache, and a coherency interface connected to the coherency controller and configured to transmit and receive broadcast address information, read or write information, and hit or miss information of another cache to and from the instruction cache. | 05-21-2015 |
20150149725 | MULTI-THREADED SYSTEM FOR PERFORMING ATOMIC BINARY TRANSLATIONS - A multi-threaded binary translation system performs atomic operations by a thread, such operations include processing a load linked instruction and a store conditional instruction. The store conditional instruction updates data stored in a shared memory address only when at least three conditions are satisfied. The conditions are: a copy of a load linked shared memory address of the load linked instruction is the same as the store conditional shared memory address, a reservation flag indicates that the thread has a valid reservation, and the copy of data stored by the load linked instruction is the same as data stored in the store conditional shared memory address. | 05-28-2015 |
20150310914 | ELECTRONIC DEVICE - Provided are, among others, memory circuits or devices and their applications in electronic devices or systems and various implementations of an electronic device which includes a semiconductor memory unit including one or more column, a data line, and a data line bar connected with a column selected among the one or more columns. Each of the one or more columns may include a plurality of storage cells each configured to store 1-bit data, each storage cell including a first and a second variable resistance elements; a bit line connected to one end of the first variable resistance element; a bit line bar connected to one end of the second variable resistance element; a source line connected to the other ends of the first and second variable resistance elements; and a driving block configured to latch data of the data line and the data line bar. | 10-29-2015 |
20150324203 | HAZARD PREDICTION FOR A GROUP OF MEMORY ACCESS INSTRUCTIONS USING A BUFFER ASSOCIATED WITH BRANCH PREDICTION - Various aspects provide for facilitating prediction of instruction pipeline hazards in a processor system. A system comprises a fetch component and an execution component. The fetch component is configured for storing a hazard prediction associated with a group of memory access instructions in a buffer associated with branch prediction. The execution component is configured for executing a memory access instruction associated with the group of memory access instructions as a function of the hazard prediction entry. In an aspect, the hazard prediction entry is configured for predicting whether the group of memory access instructions is associated with an instruction pipeline hazard. | 11-12-2015 |
20150331803 | SYSTEM AND METHOD FOR SLICE PROCESSING COMPUTER-RELATED TASKS - Methods and systems therefrom are provided for task processing in a computing environment. The method includes identifying a program to be executed by a processor for performing at least one task and determining that a memory requirement for the program does not exceed a capacity of a cache memory for the processor to perform the task. The method also includes, in response to determining that the memory requirement for the program does not exceed the capacity of the cache memory, instructing the processor to enter into a slice mode from a current mode and execute the program in the slice mode to perform the task. In the method, the slice mode comprises copying the program from the backing store memory to the cache memory and executing the program to perform the at least one task by utilizing the program copied to cache memory. | 11-19-2015 |
20150332760 | ELECTRONIC DEVICE - Disclosed is an electronic device including a semiconductor device, wherein the semiconductor device includes: a word line driving unit; a first cell array arranged at one side of the word line driving unit; a second cell array arranged at the other side of the word line driving unit; a bias voltage generation unit; and a read control unit. | 11-19-2015 |
20150347304 | TRANSACTIONAL EXECUTION DIAGNOSTICS USING DIGESTS - Gathering diagnostics during a transactional execution in a transactional memory environment, a transactional memory environment for performing transactional executions is provided. Included is identifying a first indicator, by a computer system, signaling a beginning instruction of a transaction comprising a plurality of instructions; generating, by the computer system, a computed digest based on the execution of at least one of the plurality of instructions; accumulating, by the computer system, a diagnostic data of the transaction based on the execution of the plurality of instructions; identifying, by the computer system, a second indicator associated with the plurality of instructions signaling an ending instruction of the transaction comprising the plurality of instructions; and based on an abort of the transaction, not saving the memory store data of the transaction to memory. | 12-03-2015 |
20150370717 | Computer Processor Employing Byte-Addressable Dedicated Memory For Operand Storage - A computer processor including a first memory structure that operates over multiple cycles to temporarily store operands referenced by at least one instruction. A plurality of functional units performs operations that produce and access operands stored in the first memory structure. A second memory structure is provided, separate from the first memory structure. The second memory structure is configured as a dedicated memory for storage of operands copied from the first memory structure. The second memory structure is organized with a byte-addressable memory space and each operand stored in the second memory structure is accessed by a given byte address into the byte-addressable memory space. | 12-24-2015 |
20150378922 | COLLECTING MEMORY OPERAND ACCESS CHARACTERISTICS DURING TRANSACTIONAL EXECUTION - A transactional execution of a set of instructions in a transaction of a program may be initiated to collect memory operand access characteristics of a set of instructions of a transaction during the transactional execution. The memory operand access characteristics may be stored upon a termination of the transactional execution of the set of instructions. The memory operand access characteristics may include an address of an accessed storage location, a count of a number of times the storage location is accessed, a purpose value indicating whether the storage location is accessed for a fetch, store, or update operation, a count of a number of times the storage location is accessed for one or more of a fetch, store, or update operation; a translation mode in which the storage location is accessed; and an addressing mode. | 12-31-2015 |
20160026572 | CACHE LINE CROSSING LOAD TECHNIQUES - A technique for handling an unaligned load operation includes detecting a cache line crossing load operation that is associated with a first cache line and a second cache line. In response to an cache including the first cache line but not including the second cache line, the second cache line is reloaded into the cache in a same set as the first cache line. In response to reloading the second cache line in the cache, a cache line crossing link indicator associated with the first cache line is asserted to indicate that both the first and second cache lines include portions of a desired data element. | 01-28-2016 |
20160026573 | USING A DECREMENTER INTERRUPT TO START LONG-RUNNING HARDWARE OPERATIONS BEFORE THE END OF A SHARED PROCESSOR DISPATCH CYCLE - Method to perform an operation, the operation comprising processing a first logical partition on a shared processor for the duration of a dispatch cycle, issuing, by a hypervisor, at a predefined time prior to completion of the dispatch cycle, a lightweight hypervisor decrementer (HDEC) interrupt specifying a cache line address buffer location in a virtual processor, and responsive to the lightweight HDEC, writing, by the shared processor, a set of cache line addresses used by the first logical partition to the cache line address buffer location in the virtual processor. | 01-28-2016 |
20160026574 | GENERAL PURPOSE DIGITAL DATA PROCESSOR, SYSTEMS AND METHODS - The invention provides improved data processing apparatus, systems and methods that include one or more nodes, e.g., processor modules or otherwise, that include or are otherwise coupled to cache, physical or other memory (e.g., attached flash drives or other mounted storage devices) collectively, “system memory.” At least one of the nodes includes a cache memory system that stores data (and/or instructions) recently accessed (and/or expected to be accessed) by the respective node, along with tags specifying addresses and statuses (e.g., modified, reference count, etc.) for the respective data (and/or instructions). The tags facilitate translating system addresses to physical addresses, e.g., for purposes of moving data (and/or instructions) between system memory (and, specifically, for example, physical memory-such as attached drives or other mounted storage) and the cache memory system. | 01-28-2016 |
20160034399 | BUS-BASED CACHE ARCHITECTURE - Digital signal processors often operate on two operands per instruction, and it is desirable to retrieve both operands in one cycle. Some data caches connect to the processor over two busses and internally uses two or more memory banks to store cache lines. The allocation of cache lines to specific banks is based on the address that the cache line is associated. When two memory accesses map to the same memory bank, fetching the operands incurs extra latency because the accesses are serialized. An improved bank organization for providing conflict-free dual-data cache access—a bus-based data cache system having two data buses and two memory banks—is disclosed. Each memory bank works as a default memory bank for the corresponding data bus. As long as the two values of data being accessed belong to two separate data sets assigned to the two respective data buses, memory bank conflicts are avoided. | 02-04-2016 |
20160034401 | Instruction Cache Management Based on Temporal Locality - The present disclosure relates to managing an instruction cache based on temporal locality of cached instructions. One example method includes receiving a request for a first instruction included in a software application; storing the first instruction in a cache structure; receiving a request for a second instruction included in the software application; determining that a cache entry must be removed from the cache structure to create space to store the second instruction; determining that the first instruction should be removed from the cache structure based on temporal locality attributes associated with at least one of the first instruction or the second instruction, the temporal locality attributes representing a likelihood that additional requests will be received for an associated instruction while the instruction is stored in the cache structure; removing the first instruction from the cache structure; and storing the second instruction in the cache structure. | 02-04-2016 |
20160034402 | STORAGE MEDIUM AND INFORMATION PROCESSING DEVICE FOR EXCHANGING DATA WITH SECONDARY STORAGE SECTION THROUGH CACHE - An information processing device includes a main control circuit including a central arithmetic processor that executes first processing through a first program, a sub-control circuit that executes second processing independently of the first processing, a primary storage circuit, and a secondary storage circuit. The secondary storage circuit has a slower access speed than the primary storage circuit. The secondary storage circuit stores a second program used for third processing executed once the first processing and the second processing are both complete. The main control circuit further includes a cache memory having a faster access speed than the secondary storage circuit and a cache controller. In a situation in which the second processing is not yet complete at a completion time of the first processing, the cache controller executes pre-reading of the second program from the secondary storage circuit and stores the second program into the cache memory. | 02-04-2016 |
20160043300 | ELECTRONIC DEVICE - An electronic device includes semiconductor memory, the semiconductor memory including an under layer; a first magnetic layer located over the under layer and having a variable magnetization direction; a tunnel barrier layer located over the first magnetic layer; and a second magnetic layer located over the tunnel barrier layer and having a pinned magnetization direction, wherein the under layer includes a first metal nitride layer having a NaCl crystal structure and a second metal nitride layer containing a light metal. | 02-11-2016 |
20160049582 | MEMORY DEVICE HAVING A TUNNEL BARRIER LAYER IN A MEMORY CELL, AND ELECTRONIC DEVICE INCLUDING THE SAME - An electronic device includes a semiconductor memory. The semiconductor memory includes a plurality of first lines extending in a first direction, a plurality of second lines extending in a second direction crossing the first direction, a resistance variable layer interposed between the first lines and the second lines, a tunnel barrier layer interposed between the resistance variable layer and the first lines, and an intermediate electrode layer interposed between the resistance variable layer and the tunnel barrier layer. The tunnel barrier layer and the intermediate electrode layer overlap with at least two neighboring intersection regions of the first lines and the second lines. | 02-18-2016 |
20160055083 | PROCESSORS AND METHODS FOR CACHE SPARING STORES - In one aspect, a processor has a register file, a private Level 1 (L1) cache, and an interface to a shared memory hierarchy (e.g., an Level 2 (L2) cache and so on). The processor has a Load Store Unit (LSU) that handles decoded load and store instructions. The processor may support out of order and multi-threaded execution. As store instructions arrive at the LSU for processing, the LSU determines whether a counter, from a set of counters, is allocated to a cache line affected by each store. If not, the LSU allocates a counter. If so, then the LSU updates the counter. Also, in response to a store instruction, affecting a cache line neighboring a cache line that has a counter that meets a criteria, the LSU characterizes that store instruction as one to be effected without obtaining ownership of the effected cache line, and provides that store to be serviced by an element of the shared memory hierarchy. | 02-25-2016 |
20160056209 | THREE-DIMENSIONAL SEMICONDUCTOR DEVICE AND A SYSTEM HAVING THE SAME - A 3D semiconductor device and a system having the same are provided. The 3D semiconductor device includes a semiconductor substrate, a common source region formed on the semiconductor substrate and extending in a line shape, an active region formed on the common source region and including a lateral channel region, which is substantially in parallel to a surface of the semiconductor substrate, and source and drain regions that are branched from the lateral channel region to a direction substantially perpendicular to the surface of the semiconductor substrate, and a gate formed in a space between the source region and the drain region. | 02-25-2016 |
20160056211 | ELECTRONIC DEVICE INCLUDING MEMORY CELLS HAVING VARIABLE RESISTANCE CHARACTERISTICS - An electronic device includes a semiconductor memory. The semiconductor memory includes a stack structure including a first electrode, a second electrode, a third electrode, an insulating layer interposed between the first electrode and the second electrode, and a variable resistance layer interposed between the second electrode and the third electrode; and a selection element layer disposed over at least a part of a sidewall of the stack structure. | 02-25-2016 |
20160062900 | CACHE MANAGEMENT FOR MAP-REDUCE APPLICATIONS - A computer manages a cache for a MapReduce application based on a distributed file system that includes one or more storage medium by receiving a map request and receiving parameters for processing the map request. The parameters include a total data size to be processed, a size of each data record, and a number of map requests executing simultaneously. The computer determines a cache size for processing the map request, wherein the cache size is determined based on the received parameters for processing the map request and a machine learning model for a map request cache size and reads, based on the determined cache size, data from the one or more storage medium of the distributed file system into the cache. The computer processes the map request and writes an intermediate result data of the map request processing into the cache, based on the determined cache size. | 03-03-2016 |
20160085550 | IMMEDIATE BRANCH RECODE THAT HANDLES ALIASING - A system and method for efficiently indicating branch target addresses. A semiconductor chip predecodes instructions of a computer program prior to installing the instructions in an instruction cache. In response to determining a particular instruction is a control flow instruction with a displacement relative to a program counter address (PC), the chip replaces a portion of the PC relative displacement in the particular instruction with a subset of a target address. The subset of the target address is an untranslated physical subset of the full target address. When the recoded particular instruction is fetched and decoded, the remaining portion of the PC relative displacement is added to a virtual portion of the PC used to fetch the particular instruction. The result is concatenated with the portion of the target address embedded in the fetched particular instruction to form a full target address. | 03-24-2016 |
20160085555 | TECHNOLOGIES FOR EFFICIENT LZ77-BASED DATA DECOMPRESSION - Technologies for data decompression include a computing device that reads a symbol tag byte from an input stream. The computing device determines whether the symbol can be decoded using a fast-path routine, and if not, executes a slow-path routine to decompress the symbol. The slow-path routine may include data-dependent branch instructions that may be unpredictable using branch prediction hardware. For the fast-path routine, the computing device determines a next symbol increment value, a literal increment value, a data length, and an offset based on the tag byte, without executing an unpredictable branch instruction. The computing device sets a source pointer to either literal data or reference data as a function of the tag byte, without executing an unpredictable branch instruction. The computing device may set the source pointer using a conditional move instruction. The computing device copies the data and processes remaining symbols. Other embodiments are described and claimed. | 03-24-2016 |
20160092366 | METHOD AND APPARATUS FOR DISTRIBUTED SNOOP FILTERING - An apparatus and method are described for distributed snoop filtering. For example, one embodiment of a processor comprises: a plurality of cores to execute instructions and process data; first snoop logic to track a first plurality of cache lines stored in a mid-level cache (“MLC”) accessible by one or more of the cores, the first snoop logic to allocate entries for cache lines stored in the MLC and to deallocate entries for cache lines evicted from the MLC, wherein at least some of the cache lines evicted from the MLC are retained in a level 1 (L1) cache; and second snoop logic to track a second plurality of cache lines stored in a non-inclusive last level cache (NI LLC), the second snoop logic to allocate entries in the NI LLC for cache lines evicted from the MLC and to deallocate entries for cache lines stored in the MLC, wherein the second snoop logic is to store and maintain a first set of core valid bits to identify cores containing copies of the cache lines stored in the NI LLC. | 03-31-2016 |
20160092367 | HARDWARE APPARATUSES AND METHODS TO CONTROL ACCESS TO A MULTIPLE BANK DATA CACHE - Methods and apparatuses to control access to a multiple bank data cache are described. In one embodiment, a processor includes conflict resolution logic to detect multiple instructions scheduled to access a same bank of a multiple bank data cache in a same clock cycle and to grant access priority to an instruction of the multiple instructions scheduled to access a highest total of banks of the multiple bank data cache. In another embodiment, a method includes detecting multiple instructions scheduled to access a same bank of a multiple bank data cache in a same clock cycle, and granting access priority to an instruction of the multiple instructions scheduled to access a highest total of banks of the multiple bank data cache. | 03-31-2016 |
20160103681 | LOAD AND STORE ORDERING FOR A STRONGLY ORDERED SIMULTANEOUS MULTITHREADING CORE - A mechanism for simultaneous multithreading is provided. Responsive to performing a store instruction for a given thread of threads on a processor core and responsive to the core having ownership of a cache line in a cache, an entry of the store instruction is placed in a given store queue belonging to the given thread. The entry for the store instruction has a starting memory address and an ending memory address on the cache line. The starting memory addresses through ending memory addresses of load queues of the threads are compared on a byte-per-byte basis against the starting through ending memory address of the store instruction. Responsive to one memory address byte in the starting through ending memory addresses in the load queues overlapping with a memory address byte in the starting through ending memory address of the store instruction, the threads having the one memory address byte is flushed. | 04-14-2016 |
20160103682 | LOAD AND STORE ORDERING FOR A STRONGLY ORDERED SIMULTANEOUS MULTITHREADING CORE - A mechanism for simultaneous multithreading is provided. Responsive to performing a store instruction for a given thread of threads on a processor core and responsive to the core having ownership of a cache line in a cache, an entry of the store instruction is placed in a given store queue belonging to the given thread. The entry for the store instruction has a starting memory address and an ending memory address on the cache line. The starting memory addresses through ending memory addresses of load queues of the threads are compared on a byte-per-byte basis against the starting through ending memory address of the store instruction. Responsive to one memory address byte in the starting through ending memory addresses in the load queues overlapping with a memory address byte in the starting through ending memory address of the store instruction, the threads having the one memory address byte is flushed. | 04-14-2016 |
20160124860 | METHODS AND SYSTEMS FOR HANDLING DATA RECEIVED BY A STATE MACHINE ENGINE - A data analysis system to analyze data. The data analysis system includes a data buffer configured to receive data to be analyzed. The data analysis system also includes a state machine lattice. The state machine lattice includes multiple data analysis elements and each data analysis element includes multiple memory cells configured to analyze at least a portion of the data and to output a result of the analysis. The data analysis system includes a buffer interface configured to receive the data from the data buffer and to provide the data to the state machine lattice. | 05-05-2016 |
20160148979 | ELECTRONIC DEVICE AND METHOD FOR FABRICATING THE SAME - An electronic device is provided. An electronic device according to an example of the disclosed technology includes a semiconductor memory, the semiconductor memory including: a substrate including a recess formed in the substrate; a gate including at least a portion that is buried in the substrate; a junction formed at both sides of the gate in the substrate; and a memory element electrically connected to the junction at one side of the gate, wherein the junction includes: a barrier layer formed over the recess such that a thickness of the barrier layer formed over a bottom surface of the recess is different from that of the barrier layer formed over a side surface of the recess; a contact pad formed over the barrier layer so as to fill the recess; and an impurity region formed in the substrate and located under the contact pad. | 05-26-2016 |
20160170894 | DATA PROCESSING METHOD, INFORMATION PROCESSING DEVICE, AND RECORDING MEDIUM | 06-16-2016 |
20160179409 | BUILDING FILE SYSTEM IMAGES USING CACHED LOGICAL VOLUME SNAPSHOTS | 06-23-2016 |
20160179532 | MANAGING ALLOCATION OF PHYSICAL REGISTERS IN A BLOCK-BASED INSTRUCTION SET ARCHITECTURE (ISA), AND RELATED APPARATUSES AND METHODS | 06-23-2016 |
20160179586 | LIGHTWEIGHT RESTRICTED TRANSACTIONAL MEMORY FOR SPECULATIVE COMPILER OPTIMIZATION | 06-23-2016 |
20160179682 | ALLOCATING CACHE MEMORY ON A PER DATA OBJECT BASIS | 06-23-2016 |
20160180905 | ELECTRONIC DEVICE AND METHOD FOR FABRICATING THE SAME | 06-23-2016 |
20160180928 | ELECTRONIC DEVICE AND OPERATING METHOD FOR THE SAME | 06-23-2016 |
20160180960 | FUSE ELEMENT, SEMICONDUCTOR MEMORY INCLUDING THE FUSE ELEMENT, AND ELECTRONIC DEVICE INCLUDING THE SEMICONDUCTOR MEMORY | 06-23-2016 |
20160181204 | ELECTRONIC DEVICE AND METHOD FOR FABRICATING THE SAME | 06-23-2016 |
20160181315 | ELECTRONIC DEVICE AND METHOD FOR FABRICATING THE SAME | 06-23-2016 |
20160181316 | ELECTRONIC DEVICE AND METHOD FOR FABRICATING THE SAME | 06-23-2016 |
20160181317 | ELECTRONIC DEVICE | 06-23-2016 |
20160188222 | Integrated Main Memory And Coprocessor With Low Latency - System, method, and apparatus for integrated main memory (MM) and configurable coprocessor (CP) chip for processing subset of network functions. Chip supports external accesses to MM without additional latency from on-chip CP. On-chip memory scheduler resolves all bank conflicts and configurably load balances MM accesses. Instruction set and data on which the CP executes instructions are all disposed on-chip with no on-chip cache memory, thereby avoiding latency and coherency issues. Multiple independent and orthogonal threading domains used: a FIFO-based scheduling domain (SD) for the I/O; a multi-threaded processing domain for the CP. The CP is an array of independent, autonomous, unsequenced processing engines processing on-chip data tracked by SD of external CMD and reordered per FIFO CMD sequence before transmission. Paired I/O ports tied to unique global on-chip SD allow multiple external processors to slave chip and its resources independently and autonomously without scheduling between the external processors. | 06-30-2016 |
20160188480 | EXTRACT TARGET CACHE ATTRIBUTE FACILITY AND INSTRUCTION THEREFOR - A facility and cache machine instruction of a computer architecture for specifying a target cache cache-level and a target cache attribute of interest for obtaining a cache attribute of one or more target caches. The requested cache attribute of the target cache(s) is saved in a register. | 06-30-2016 |
20160203081 | DEVICE AND PROCESSING METHOD | 07-14-2016 |