Patent application number | Description | Published |
20120188259 | Mechanisms for Enabling Task Scheduling - Embodiments described herein provide a method including receiving a command to schedule a first process and selecting a command queue associated with the first process. The method also includes scheduling the first process to run on an accelerated processing device and preempting a second process running on the accelerated processing device to allow the first process to run on the accelerated processing device. | 07-26-2012 |
20120194524 | Preemptive Context Switching - Methods, systems, and computer readable media embodiments are disclosed for preemptive context-switching of processes running on a accelerated processing device. Embodiments include, detecting by an accelerated processing device a memory exception, and preempting a process from running on the accelerated processing device based upon the detected exception. | 08-02-2012 |
20120194525 | Managed Task Scheduling on a Graphics Processing Device (APD) - Provided herein is a method including receiving a run list including one or more processes to run on an accelerated processing device, wherein each of the one or more processes is associated with a corresponding independent job command queue. The method also includes scheduling each of the one or more processes to run on the accelerated processing device based on a criteria associated with each process. | 08-02-2012 |
20120194528 | Method and System for Context Switching - Embodiments of the present invention provide a method of preempting a task. The method includes removing the task from the parallel processors via a scheduling mechanism. Responsive to the removing, the method also includes ceasing (i) retrieval of commands from a buffer associated with the task, (ii) dispatch of groups of work-items associated with the task, (iii) dispatch of wavefronts associated with the task, and (iiii) execution of the wavefronts. State information related to the task is saved. | 08-02-2012 |
20120200576 | Preemptive context switching of processes on ac accelerated processing device (APD) based on time quanta - Methods, systems, and computer readable media for preemptive context-switching of processes on an accelerated processing device are based upon a comparison of the running time of the process and a threshold time quanta. A method includes preempting a process running on an accelerated processing device based upon a running time of the process and a threshold time quanta. | 08-09-2012 |
20120200579 | Process Device Context Switching - Methods, systems, and computer readable media embodiments are disclosed for preemptive context-switching of processes running on an accelerated processing device. A method includes, responsive to an exception upon access to a memory by a process running on a accelerated processing device, whether to preempt the process based on the exception, and preempting, based upon the determining, the process from running on the accelerated processing device. | 08-09-2012 |
20120311269 | NON-UNIFORM MEMORY-AWARE CACHE MANAGEMENT - An apparatus is disclosed for caching memory data in a computer system with multiple system memories. The apparatus comprises a data cache for caching memory data. The apparatus is configured to determine a retention priority for a cache block stored in the data cache. The retention priority is based on a performance characteristic of a system memory from which the cache block is cached. | 12-06-2012 |
20130024597 | TRACKING MEMORY ACCESS FREQUENCIES AND UTILIZATION - A method is provided including recording, in a counter of a set of counters, a number of cache accesses for a page corresponding to a translation lookaside buffer (TLB) page table entry, where the counters are physically grouped together and physically separate from the TLB. The method also includes recording the number of cache accesses from the corresponding counter to a field of the page table responsive to an event. An apparatus is provided that includes a memory unit and a set of counters coupled to the one memory unit, the set of counters comprises one or more counters that are physically grouped together and are adapted to store a value indicative of a number of memory page accesses. The apparatus includes a cache coupled to the set of counters. Also provided is a computer readable storage device encoded with data for adapting a manufacturing facility to create the apparatus. | 01-24-2013 |
20130135327 | Saving and Restoring Non-Shader State Using a Command Processor - Provided is a system including a command processor configured for interrupting processing of a first set of instructions executing within a shader core. | 05-30-2013 |
20130141446 | Method and Apparatus for Servicing Page Fault Exceptions - A method, apparatus and computer readable media for servicing page fault exceptions in a accelerated processing device (APD). A page fault related to a wavefront is detected. A fault handling request to a translation mechanism is sent when the page fault is detected. A fault handling response corresponding to the detected page fault from the translation mechanism is received. Confirmation that the detected page fault has been handled through performing page mapping based on the fault handling response is received. | 06-06-2013 |
20130155079 | Saving and Restoring Shader Context State - Provided is a method for processing a command in a computing system including an accelerated processing device (APD) having a command processor. The method includes executing an interrupt routine to save one or more contexts related to a first set of instructions on a shader core in response to an instruction to preempt processing of the first set of instructions. | 06-20-2013 |
20130160019 | Method for Resuming an APD Wavefront in Which a Subset of Elements Have Faulted - A method resumes an accelerated processing device (APD) wavefront in which a subset of elements have faulted. A restore command for a job including a wavefront is received. A list of context states for the wavefront is read from a memory associated with a APD. An empty shell wavefront is created for restoring the list of context states. A portion of not acknowledged data is masked over a portion of acknowledged data within the restored wavefronts. | 06-20-2013 |
20140156941 | Tracking Non-Native Content in Caches - The described embodiments include a cache with a plurality of banks that includes a cache controller. In these embodiments, the cache controller determines a value representing non-native cache blocks stored in at least one bank in the cache, wherein a cache block is non-native to a bank when a home for the cache block is in a predetermined location relative to the bank. Then, based on the value representing non-native cache blocks stored in the at least one bank, the cache controller determines at least one bank in the cache to be transitioned from a first power mode to a second power mode. Next, the cache controller transitions the determined at least one bank in the cache from the first power mode to the second power mode. | 06-05-2014 |
20140173225 | REDUCING MEMORY ACCESS TIME IN PARALLEL PROCESSORS - Apparatus, computer readable medium, and method of servicing memory requests are presented. A first plurality of memory requests are associated together, wherein each of the first plurality of memory requests is generated by a corresponding one of a first plurality of processors, and wherein each of the first plurality of processors is executing a first same instruction. A second plurality of memory requests are associated together, wherein each of the second plurality of memory requests is generated by a corresponding one of a second plurality of processors, and wherein each of the second plurality of processors is executing a second same instruction. A determination is made to service the first plurality of memory requests before the second plurality of memory requests and the first plurality of memory requests is serviced before the second plurality of memory requests. | 06-19-2014 |
20140177347 | INTER-ROW DATA TRANSFER IN MEMORY DEVICES - A method and apparatus for inter-row data transfer in memory devices is described. Data transfer from one physical location in a memory device to another is achieved without engaging the external input/output pins on the memory device. In an example method, a memory device is responsive to a row transfer (RT) command which includes a source row identifier and a target row identifier. The memory device activates a source row and storing source row data in a row buffer, latches the target row identifier into the memory device, activates a word line of a target row to prepare for a write operation, and stores the source row data from the row buffer into the target row. | 06-26-2014 |
20140181415 | PREFETCHING FUNCTIONALITY ON A LOGIC DIE STACKED WITH MEMORY - Prefetching functionality on a logic die stacked with memory is described herein. A device includes a logic chip stacked with a memory chip. The logic chip includes a control block, an in-stack prefetch request handler and a memory controller. The control block receives memory requests from an external source and determines availability of the requested data in the in-stack prefetch request handler. If the data is available, the control block sends the requested data to the external source. If the data is not available, the control block obtains the requested data via the memory controller. The in-stack prefetch request handler includes a prefetch controller, a prefetcher and a prefetch buffer. The prefetcher monitors the memory requests and based on observed patterns, issues additional prefetch requests to the memory controller. | 06-26-2014 |
20140181460 | PROCESSING DEVICE WITH ADDRESS TRANSLATION PROBING AND METHODS - A data processing device is provided that employs multiple translation look-aside buffers (TLBs) associated with respective processors that are configured to store selected address translations of a page table of a memory shared by the processors. The processing device is configured such that when an address translation is requested by a processor and is not found in the TLB associated with that processor, another TLB is probed for the requested address translation. The probe across to the other TLB may occur in advance of a walk of the page table for the requested address or alternatively a walk can be initiated concurrently with the probe. Where the probe successfully finds the requested address translation, the page table walk can be avoided or discontinued. | 06-26-2014 |
20140380329 | CONTROLLING SPRINTING FOR THERMAL CAPACITY BOOSTED SYSTEMS - A method and apparatus are described for performing sprinting in a processor. An analyzer in the processor may monitor thermal capacity remaining in the processor while not sprinting. When the remaining thermal capacity is sufficient to support sprinting, the analyzer may perform sprinting of a new workload when a benefit derived by sprinting the new workload exceeds a threshold and does not cause the remaining thermal capacity in the processor to be exhausted. The analyzer may perform sprinting of the new workload in accordance with sprinting parameters determined for the new workload. The analyzer may continue to monitor the remaining thermal capacity while not sprinting when the benefit derived by sprinting the new workload does not exceed the threshold. | 12-25-2014 |
20150061150 | STACKED SEMICONDUCTOR CHIP DEVICE WITH PHASE CHANGE MATERIAL - Various stacked semiconductor chip arrangements and methods of manufacturing the same are disclosed. In one aspect, an apparatus is provided that includes a first semiconductor chip, a second semiconductor chip mounted on the first semiconductor chip, and a first portion of a phase change material positioned in a first pocket associated with the first semiconductor chip or the second semiconductor chip to store heat generated by one or both of the first and second semiconductor chips. | 03-05-2015 |
Patent application number | Description | Published |
20120198458 | Methods and Systems for Synchronous Operation of a Processing Device - Embodiments of the present invention provide a method of synchronous operation of a first processing device and a second processing device. The method includes executing a process on the first processing device, responsive to a determination that execution of the process on the first device has reached a serial-parallel boundary, passing an execution thread of the process from the first processing device to the second processing device, and executing the process on the second processing device. | 08-02-2012 |
20130159812 | MEMORY ARCHITECTURE FOR READ-MODIFY-WRITE OPERATIONS - According to one embodiment, a memory architecture implemented method is provided, where the memory architecture includes a logic chip and one or more memory chips on a single die, and where the method comprises: reading values of data from the one or more memory chips to the logic chip, where the one or more memory chips and the logic chip are on a single die; modifying, via the logic chip on the single die, the values of data; and writing, from the logic chip to the one or more memory chips, the modified values of data. | 06-20-2013 |
20130160016 | Allocating Compute Kernels to Processors in a Heterogeneous System - A system and method embodiments for optimally allocating compute kernels to different types of processors, such as CPUs and GPUs, in a heterogeneous computer system are disclosed. These include comparing a kernel profile of a compute kernel to respective processor profiles of a plurality of processors in a heterogeneous computer system, selecting at least one processor from the plurality of processors based upon the comparing, and scheduling the compute kernel for execution in the selected at least one processor. | 06-20-2013 |
20130160017 | Software Mechanisms for Managing Task Scheduling on an Accelerated Processing Device (APD) - Embodiments describe herein provide a method of for managing task scheduling on a accelerated processing device. The method includes executing a first task within the accelerated processing device (APD), monitoring for an interruption of the execution of the first task, and switching to a second task when an interruption is detected. | 06-20-2013 |
20140022263 | METHOD FOR URGENCY-BASED PREEMPTION OF A PROCESS - The desire to use an Accelerated Processing Device (APD) for general computation has increased due to the APD's exemplary performance characteristics. However, current systems incur high overhead when dispatching work to the APD because a process cannot be efficiently identified or preempted. The occupying of the APD by a rogue process for arbitrary amounts of time can prevent the effective utilization of the available system capacity and can reduce the processing progress of the system. Embodiments described herein can overcome this deficiency by enabling the system software to pre-empt a process executing on the APD for any reason. The APD provides an interface for initiating such a pre-emption. This interface exposes an urgency of the request which determines whether the process being preempted is allowed a grace period to complete its issued work before being forced off the hardware. | 01-23-2014 |
20140040532 | STACKED MEMORY DEVICE WITH HELPER PROCESSOR - A processing system comprises one or more processor devices and other system components coupled to a stacked memory device having a set of stacked memory layers and a set of one or more logic layers. The set of logic layers implements a helper processor that executes instructions to perform tasks in response to a task request from the processor devices or otherwise on behalf of the other processor devices. The set of logic layers also includes a memory interface coupled to memory cell circuitry implemented in the set of stacked memory layers and coupleable to the processor devices. The memory interface operates to perform memory accesses for the processor devices and for the helper processor. By virtue of the helper processor's tight integration with the stacked memory layers, the helper processor may perform certain memory-intensive operations more efficiently than could be performed by the external processor devices. | 02-06-2014 |
20140101405 | REDUCING COLD TLB MISSES IN A HETEROGENEOUS COMPUTING SYSTEM - Methods and apparatuses are provided for avoiding cold translation lookaside buffer (TLB) misses in a computer system. A typical system is configured as a heterogeneous computing system having at least one central processing unit (CPU) and one or more graphic processing units (GPUs) that share a common memory address space. Each processing unit (CPU and GPU) has an independent TLB. When offloading a task from a particular CPU to a particular GPU, translation information is sent along with the task assignment. The translation information allows the GPU to load the address translation data into the TLB associated with the one or more GPUs prior to executing the task. Preloading the TLB of the GPUs reduces or avoids cold TLB misses that could otherwise occur without the benefits offered by the present disclosure. | 04-10-2014 |
20140149677 | Prefetch Kernels on Data-Parallel Processors - Embodiments include methods, systems and computer readable media configured to execute a first kernel (e.g. compute or graphics kernel) with reduced intermediate state storage resource requirements. These include executing a first and second (e.g. prefetch) kernel on a data-parallel processor, such that the second kernel begins executing before the first kernel. The second kernel performs memory operations that are based upon at least a subset of memory operations in the first kernel. | 05-29-2014 |
20140149772 | Using a Linear Prediction to Configure an Idle State of an Entity in a Computing Device - The described embodiments include a computing device with one or more entities (processor cores, processors, etc.). In some embodiments, during operation, a thermal power management unit in the computing device uses a linear prediction to compute a predicted duration of a next idle period for an entity based on the duration of one or more previous idle periods for the entity. Based on the predicted duration of the next idle period, the thermal power management unit configures the entity to operate in a corresponding idle state. | 05-29-2014 |
20140156975 | Redundant Threading for Improved Reliability - In some embodiments, a method for improving reliability in a processor is provided. The method can include replicating input data for first and second lanes of a processor, the first and second lanes being located in a same cluster of the processor and the first and second lanes each generating a respective value associated with an instruction to be executed in the respective lane, and responsive to a determination that the generated values do not match, providing an indication that the generated values do not match. | 06-05-2014 |
20140173216 | Invalidation of Dead Transient Data in Caches - Embodiments include methods, systems, and articles of manufacture directed to identifying transient data upon storing the transient data in a cache memory, and invalidating the identified transient data in the cache memory. | 06-19-2014 |
20140176187 | DIE-STACKED MEMORY DEVICE WITH RECONFIGURABLE LOGIC - A die-stacked memory device incorporates a reconfigurable logic device to provide implementation flexibility in performing various data manipulation operations and other memory operations that use data stored in the die-stacked memory device or that result in data that is to be stored in the die-stacked memory device. One or more configuration files representing corresponding logic configurations for the reconfigurable logic device can be stored in a configuration store at the die-stacked memory device, and a configuration controller can program a reconfigurable logic fabric of the reconfigurable logic device using a selected one of the configuration files. Due to the integration of the logic dies and the memory dies, the reconfigurable logic device can perform various data manipulation operations with higher bandwidth and lower latency and power consumption compared to devices external to the die-stacked memory device. | 06-26-2014 |
20140181414 | MECHANISMS TO BOUND THE PRESENCE OF CACHE BLOCKS WITH SPECIFIC PROPERTIES IN CACHES - A system and method for efficiently limiting storage space for data with particular properties in a cache memory. A computing system includes a cache array and a corresponding cache controller. The cache array includes multiple banks, wherein a first bank is powered down. In response a write request to a second bank for data indicated to be stored in the powered down first bank, the cache controller determines a respective bypass condition for the data. If the bypass condition exceeds a threshold, then the cache controller invalidates any copy of the data stored in the second bank. If the bypass condition does not exceed the threshold, then the cache controller stores the data with a clean state in the second bank. The cache controller writes the data in a lower-level memory for both cases. | 06-26-2014 |
20140181421 | PROCESSING ENGINE FOR COMPLEX ATOMIC OPERATIONS - A system includes an atomic processing engine (APE) coupled to an interconnect. The interconnect is to couple to one or more processor cores. The APE receives a plurality of commands from the one or more processor cores through the interconnect. In response to a first command, the APE performs a first plurality of operations associated with the first command. The first plurality of operations references multiple memory locations, at least one of which is shared between two or more threads executed by the one or more processor cores. | 06-26-2014 |
20140181427 | Compound Memory Operations in a Logic Layer of a Stacked Memory - Some die-stacked memories will contain a logic layer in addition to one or more layers of DRAM (or other memory technology). This logic layer may be a discrete logic die or logic on a silicon interposer associated with a stack of memory dies. Additional circuitry/functionality is placed on the logic layer to implement functionality to perform various data movement and address calculation operations. This functionality would allow compound memory operations—a single request communicated to the memory that characterizes the accesses and movement of many data items. This eliminates the performance and power overheads associated with communicating address and control information on a fine-grain, per-data-item basis from a host processor (or other device) to the memory. This approach also provides better visibility of macro-level memory access patterns to the memory system and may enable additional optimizations in scheduling memory accesses. | 06-26-2014 |
20140181453 | Processor with Host and Slave Operating Modes Stacked with Memory - A system, method, and computer program product are provided for a memory device system. One or more memory dies and at least one logic die are disposed in a package and communicatively coupled. The logic die comprises a processing device configurable to manage virtual memory and operate in an operating mode. The operating mode is selected from a set of operating modes comprising a slave operating mode and a host operating mode. | 06-26-2014 |
20140181457 | Write Endurance Management Techniques in the Logic Layer of a Stacked Memory - A system, method, and memory device embodying some aspects of the present invention for remapping external memory addresses and internal memory locations in stacked memory are provided. The stacked memory includes one or more memory layers configured to store data. The stacked memory also includes a logic layer connected to the memory layer. The logic layer has an Input/Output (I/O) port configured to receive read and write commands from external devices, a memory map configured to maintain an association between external memory addresses and internal memory locations, and a controller coupled to the I/O port, memory map, and memory layers, configured to store data received from external devices to internal memory locations. | 06-26-2014 |
20140181458 | DIE-STACKED MEMORY DEVICE PROVIDING DATA TRANSLATION - A die-stacked memory device incorporates a data translation controller at one or more logic dies of the device to provide data translation services for data to be stored at, or retrieved from, the die-stacked memory device. The data translation operations implemented by the data translation controller can include compression/decompression operations, encryption/decryption operations, format translations, wear-leveling translations, data ordering operations, and the like. Due to the tight integration of the logic dies and the memory dies, the data translation controller can perform data translation operations with higher bandwidth and lower latency and power consumption compared to operations performed by devices external to the die-stacked memory device. | 06-26-2014 |
20140181483 | Computation Memory Operations in a Logic Layer of a Stacked Memory - Some die-stacked memories will contain a logic layer in addition to one or more layers of DRAM (or other memory technology). This logic layer may be a discrete logic die or logic on a silicon interposer associated with a stack of memory dies. Additional circuitry/functionality is placed on the logic layer to implement functionality to perform various computation operations. This functionality would be desired where performing the operations locally near the memory devices would allow increased performance and/or power efficiency by avoiding transmission of data across the interface to the host processor. | 06-26-2014 |
20140372711 | SCHEDULING MEMORY ACCESSES USING AN EFFICIENT ROW BURST VALUE - A memory accessing agent includes a memory access generating circuit and a memory controller. The memory access generating circuit is adapted to generate multiple memory accesses in a first ordered arrangement. The memory controller is coupled to the memory access generating circuit and has an output port, for providing the multiple memory accesses to the output port in a second ordered arrangement based on the memory accesses and characteristics of an external memory. The memory controller determines the second ordered arrangement by calculating an efficient row burst value and interrupting multiple row-hit requests to schedule a row-miss request based on the efficient row burst value. | 12-18-2014 |
20140380003 | Method and System for Asymmetrical Processing With Managed Data Affinity - Methods, systems and computer readable storage mediums for more efficient and flexible scheduling of tasks on an asymmetric processing system having at least one host processor and one or more slave processors, are disclosed. An example embodiment includes, determining a data access requirement of a task, comparing the data access requirement to respective local memories of the one or more slave processors selecting a slave processor from the one or more slave processors based upon the comparing, and running the task on the selected slave processor. | 12-25-2014 |
20150016172 | QUERY OPERATIONS FOR STACKED-DIE MEMORY DEVICE - An integrated circuit (IC) package includes a stacked-die memory device. The stacked-die memory device includes a set of one or more stacked memory dies implementing memory cell circuitry. The stacked-die memory device further includes a set of one or more logic dies electrically coupled to the memory cell circuitry. The set of one or more logic dies includes a query controller and a memory controller. The memory controller is coupleable to at least one device external to the stacked-die memory device. The query controller is to perform a query operation on data stored in the memory cell circuitry responsive to a query command received from the external device. | 01-15-2015 |
20150067357 | PREDICTION FOR POWER GATING - The present application describes embodiments of methods for tournament prediction of power gating in processing devices. Some embodiments of the method include selecting one of a plurality of predictions of a duration of a time to a power state transition of a component in a processing device. The plurality of predictions are generated using a corresponding plurality of prediction algorithms. Some embodiments of the method also include deciding whether to transition the component from a first power state to a second power state based on the selected prediction. | 03-05-2015 |