Patents - stay tuned to the technology

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


Advanced Micro Devices, Inc.

Advanced Micro Devices, Inc. Patent applications
Patent application numberTitlePublished
20160140084EFFICIENT SPARSE MATRIX-VECTOR MULTIPLICATION ON PARALLEL PROCESSORS - A method of multiplication of a sparse matrix and a vector to obtain a new vector and a system for implementing the method are claimed. Embodiments of the method are intended to optimize the performance of sparse matrix-vector multiplication in highly parallel processors, such as GPUs. The sparse matrix is stored in compressed sparse row (CSR) format.05-19-2016
20160139624PROCESSOR AND METHODS FOR REMOTE SCOPED SYNCHRONIZATION - Described herein is an apparatus and method for remote scoped synchronization, which is a new semantic that allows a work-item to order memory accesses with a scope instance outside of its scope hierarchy. More precisely, remote synchronization expands visibility at a particular scope to all scope-instances encompassed by that scope. Remote scoped synchronization operation allows smaller scopes to be used more frequently and defers added cost to only when larger scoped synchronization is required. This enables programmers to optimize the scope that memory operations are performed at for important communication patterns like work stealing. Executing memory operations at the optimum scope reduces both execution time and energy. In particular, remote synchronization allows a work-item to communicate with a scope that it otherwise would not be able to access. Specifically, work-items can pull valid data from and push updates to scopes that do not (hierarchically) contain them.05-19-2016
20160124871EXTERNAL DATA PROCESSING FOR A DISPLAY DEVICE - A method, a device, and a non-transitory computer readable medium for performing external processing on a display device are presented. An application is executed on the display device. Data is sent from the application to an external processor in direct communication with the display device, if the application requires additional processing capabilities than is available on the display device. Data is received from the external processor and the processed data is displayed on the display device.05-05-2016
20160124852MEMORY MANAGEMENT FOR GRAPHICS PROCESSING UNIT WORKLOADS - A method, a device, and a non-transitory computer readable medium for performing memory management in a graphics processing unit are presented. Hints about the memory usage of an application are provided to a page manager. At least one runtime memory usage pattern of the application is sent to the page manager. Data is swapped into and out of a memory by analyzing the hints and the at least one runtime memory usage pattern.05-05-2016
20160117206METHOD AND SYSTEM FOR BLOCK SCHEDULING CONTROL IN A PROCESSOR BY REMAPPING - A method and a system for block scheduling are disclosed. The method includes retrieving an original block ID, determining a corresponding new block ID from a mapping, executing a new block corresponding to the new block ID, and repeating the retrieving, determining, and executing for each original block ID. The system includes a program memory configured to store multi-block computer programs, an identifier memory configured to store block identifiers (ID's), management hardware configured to retrieve an original block ID from the program memory, scheduling hardware configured to receive the original block ID from the management hardware and determine a new block ID corresponding to the original block ID using a stored mapping, and processing hardware configured to receive the new block ID from the scheduling hardware and execute a new block corresponding to the new block ID.04-28-2016
20160116970POWER MANAGEMENT - Processor power may be managed by executing state storage and power gating instructions after receiving an idle indication. The idle indication may be received while the processor is executing instructions in a first mode, and the processor may execute the state storage and power gating instructions in a second mode. The state storage and power gating instructions may be inaccessible to the processor when operating in the first mode.04-28-2016
20160098275THERMAL-AWARE COMPILER FOR PARALLEL INSTRUCTION EXECUTION IN PROCESSORS - Embodiments are described for a method for compiling instruction code for execution in a processor having a number of functional units by determining a thermal constraint of the processor, and defining instruction words comprising both real instructions and one or more no operation (NOP) instructions to be executed by the functional units within a single clock cycle, wherein a number of NOP instructions executed over a number of consecutive clock cycles is configured to prevent exceeding the thermal constraint during execution of the instruction code.04-07-2016
20160085551HETEROGENEOUS FUNCTION UNIT DISPATCH IN A GRAPHICS PROCESSING UNIT - A compute unit configured to execute multiple threads in parallel is presented. The compute unit includes one or more single instruction multiple data (SIMD) units and a fetch and decode logic. The SIMD units have differing numbers of arithmetic logic units (ALUs), such that each SIMD unit can execute a different number of threads. The fetch and decode logic is in communication with each of the SIMD units, and is configured to assign the threads to the SIMD units for execution based on such differing numbers of ALUs.03-24-2016
20160055609GRAPHICS PROCESSING METHOD, SYSTEM, AND APPARATUS - A graphics processing apparatus, system, and method is provided. The graphics processing method includes separating a graphics context and graphics object from a packet; calculating a magic number of the graphics context; comparing the magic number of the graphics context with magic numbers stored in a context table, wherein each of the magic numbers corresponds to a specific graphics context; and, if the magic number of the graphics context is not found among the magic numbers in the context table, adding the graphics context to a graphics context slot of a graphics context storage, adding the graphics object to a graphics object list separate from the graphics context storage, and associating the graphics context slot with the listed graphics object.02-25-2016
20160055033RUNTIME FOR AUTOMATICALLY LOAD-BALANCING AND SYNCHRONIZING HETEROGENEOUS COMPUTER SYSTEMS WITH SCOPED SYNCHRONIZATION - Sharing tasks among compute units in a processor can increase the efficiency of the processor. When a compute unit does not have a task in its task memory to perform, donating tasks from other compute units can prevent the compute unit from being idle while there is task in other parts of the processor. It is desirable to share tasks among compute units that are within defined scopes of the processor. Compute units may share tasks by allowing other compute units to access their private memory, or by donating tasks to a shared memory.02-25-2016
20160055005System and Method for Page-Conscious GPU Instruction - Embodiments disclose a system and method for reducing virtual address translation latency in a wide execution engine that implements virtual memory. One example method describes a method comprising receiving a wavefront, classifying the wavefront into a subset based on classification criteria selected to reduce virtual address translation latency associated with a memory support structure, and scheduling the wavefront for processing based on the classifying.02-25-2016
20160048376PORTABLE BINARY IMAGE FORMAT (PBIF) FOR PRE-COMPILED KERNELS - Embodiments include methods, systems, and computer-readable medium directed to a compiler for compiling a portable binary image. The compiler compiles a program source code into a first executable specific to a first instruction set architecture (ISA). The compiler then compiles the program source code into a code generator output. Additionally the compiler combines the executable and the code generator output into a portable binary image. At runtime on a target device, the code generator output can be compiled into a second executable in accordance to a second ISA specific to the target device if the originally compiled first executable specific to the first ISA is not executable on the target device.02-18-2016
20160042488METHOD AND SYSTEM FOR FRAME PACING - A frame pacing method, computer program product, and computing system are provided for graphics processing.02-11-2016
20160041914Cache Bypassing Policy Based on Prefetch Streams - Embodiments include methods, systems, and computer readable medium directed to cache bypassing based on prefetch streams. A first cache receives a memory access request. The request references data in the memory. The data comprises non-reuse data. After a determination of a miss in the first cache, the first cache forwards the memory access request to a cache control logic. The detection of the non-reuse data instructs the cache control logic to allocate a block only in a second cache and bypass allocating a block in the first cache. The first cache is closer to the memory than the second cache.02-11-2016
20160041909MOVING DATA BETWEEN CACHES IN A HETEROGENEOUS PROCESSOR SYSTEM - Apparatus, computer readable medium, integrated circuit, and method of moving a plurality of data items to a first cache or a second cache are presented. The method includes receiving an indication that the first cache requested the plurality of data items. The method includes storing information indicating that the first cache requested the plurality of data items. The information may include an address for each of the plurality of data items. The method includes determining based at least on the stored information to move the plurality of data items to the second cache. The method includes moving the plurality of data items to the second cache. The method may include determining a time interval between receiving the indication that the first cache requested the plurality of data items and moving the plurality of data items to the second cache. A scratch pad memory is disclosed.02-11-2016
20160004445DEVICES AND METHODS FOR INTERCONNECTING SERVER NODES - Described are aggregation devices and methods for interconnecting server nodes. The aggregation device can include an input region, an output region, and a memory switch. The input region includes a plurality of input ports. The memory switch has a shared through silicon via (TSV) memory coupled to the input ports for temporarily storing data received at the input ports from a plurality of source devices. The output region includes a plurality of output ports coupled to the TSV memory. The output ports provide the data to a plurality of destination devices. A memory allocation system coordinates a transfer of the data from the source devices to the TSV memory. The output ports receive and process the data from the TSV memory independently of a communication from the input ports.01-07-2016
20150382007VIDEO AND IMAGE COMPRESSION BASED ON POSITION OF THE IMAGE GENERATING DEVICE - An apparatus, computer readable medium, and method of compressing images generated on an image generating device, the method including responsive to a generated image and position and orientation data associated with an image generating device which generated the image, selecting a previously generated image having a similar position and a similar orientation as the generated image; and if a comparison between the selected previously generated image and the generated image indicates the difference between one of the previously generated images and the generated image is less than a threshold difference, then compressing the generated image using the previously generated image. The method may include generating the generated image from light incident to the image generating device, and generating the position and orientation associated with the image generating device.12-31-2015
20150372677DUAL-RAIL ENCODING - Embodiments may include a method, system and apparatus for providing for encoded dual-rail signal communications in asynchronous circuitry. A dual rail signal pair is received. The dual rail signal pair comprises a first value indicative of a first wait state, a second value indicative of a logic value of a first bit, a third value indicative of a second wait state and a first logic value of a second bit, and/or a fourth value indicative of second wait state and a second logic value of said second bit.12-24-2015
20150363310MEMORY HEAPS IN A MEMORY MODEL FOR A UNIFIED COMPUTING SYSTEM - A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a first processor configured for unified operation with a second processor. The method includes receiving a memory operation from a processor and mapping the memory operation to one of a plurality of memory heaps. The mapping produces a mapping result. The method also includes providing the mapping result to the processor.12-17-2015
20150346798SYSTEM AND METHOD FOR ADJUSTING PERFORMANCE BASED ON THERMAL CONDITIONS WITHIN A PROCESSOR - A system and method for efficient management of operating modes within an integrated circuit (IC) for optimal power and performance targets. A semiconductor chip includes processing units each of which operates with respective operating parameters. Temperature sensors are included to measure a temperature of the one or more processing units during operation. A power manager determines a calculated power value independent of thermal conditions and current draw. The power manager reads each of a first thermal design power (TDP) value for the processing units and a second TDP value for a platform housing the semiconductor chip. The power manager determines a ratio of the first TDP value to the second TDP value. Additionally, the power manager determines another ratio of the first TDP value to the calculated power value. Using the measured temperature, the ratios and the calculated power value, the power manager determines a manner to adjust the operating parameters.12-03-2015
20150341032LOCALLY ASYNCHRONOUS LOGIC CIRCUIT AND METHOD THEREFOR - A locally asynchronous logic circuit includes an input latch; a synchronous-to-asynchronous control circuit having an input for receiving a first clock signal, a first output coupled to the latch enable input of the input latch, and a second output for providing a start signal; a predetermined number of stages coupled between the output of the input latch and an output of the locally asynchronous logic circuit, each stage having an asynchronous functional circuit and an associated completion circuit having an input for receiving a corresponding start signal and an output for providing a corresponding done signal; and an asynchronous-to-synchronous control circuit having a first input for receiving a done signal of a preceding stage, and an output for providing a valid signal. The asynchronous-to-synchronous control circuit activates said first valid signal to indicate said output of the locally asynchronous logic circuit is valid.11-26-2015
20150339192CHANNEL ROTATING ERROR CORRECTION CODE - A write or read method for use in a computer having multiple channels of memory includes writing or reading data to or from one channel in the memory, and simultaneously in parallel writing or reading an error correction code corresponding to the data to or from a different channel in the memory.11-26-2015
20150332427REDUNDANCY METHOD AND APPARATUS FOR SHADER COLUMN REPAIR - Methods, systems and non-transitory computer readable media are described. A system includes a shader pipe array, a redundant shader pipe array, a sequencer and a redundant shader switch. The shader pipe array includes multiple shader pipes, each of which perform rendering calculations on data provided thereto. The redundant shader pipe array also performs rendering calculations on data provided thereto. The sequencer identifies at least one defective shader pipe in the shader pipe array, and, in response, generates a signal. The redundant shader switch receives the generated signal, and, in response, transfers the data destined for each shader pipe identified as being defective independently to the redundant shader pipe array.11-19-2015
20150331433HYBRID SYSTEM AND METHOD FOR DETERMINING PERFORMANCE LEVELS BASED ON THERMAL CONDITIONS WITHIN A PROCESSOR - A system and method for efficient management of operating modes within an integrated circuit (IC) for optimal power and performance targets. A semiconductor chip includes one or more processing units each of which operates with respective operating parameters. One or more temperature sensors are included to measure a temperature of the one or more processing units during operation. When the measured temperature exceeds a threshold, a power manager on the chip determines a temperature headroom utilizing temperature values based on worst-case ambient temperature. When the measured temperature does not exceed the threshold, the power manager determines the temperature headroom utilizing at least one temperature value based on room ambient temperature. Following, the power manager adjusts the respective operating parameters based on at least the temperature headroom.11-19-2015
20150324131SYSTEM AND METHOD FOR MEMORY ALLOCATION IN A MULTICLASS MEMORY SYSTEM - A system for memory allocation in a multiclass memory system includes a processor coupleable to a plurality of memories sharing a unified memory address space, and a library store to store a library of software functions. The processor identifies a type of a data structure in response to a memory allocation function call to the library for allocating memory to the data structure. Using the library, the processor allocates portions of the data structure among multiple memories of the multiclass memory system based on the type of the data structure.11-12-2015
20150318056MEMORY ARRAY TEST LOGIC - A test circuit for a static random access memory (SRAM) array includes a plurality of stages coupled in a ring. Each stage includes a plurality of bit cells to store information, a bit line and a complementary bit line coupled to the plurality of bit cells, and a plurality of word lines coupled to the plurality of bit cells. Subsets of the plurality of word lines of each of the plurality of stages are selectively enabled based on signals asserted on the complementary bit line of another one of the plurality of stages. The test circuit also includes inversion logic deployed between two of the plurality of stages.11-05-2015
20150317269SWITCHING A COMPUTER SYSTEM FROM A HIGH PERFORMANCE MODE TO A LOW POWER MODE - A computer system includes a first processor, a second processor, and a common memory connected to the second processor. The computer system is switched from a high performance mode, in which at least a portion of the first processor and at least a portion of components on the second processor are active, to a low power mode, in which at least a portion of the first processor is active and the components on the second processor are inactive. All central processing unit (CPU) cores on the second processor are quiesced. Traffic from the second processor to the common memory is quiesced. Paths used by the first processor to access the common memory are switched from a first path across the second processor to a second path across the second processor.11-05-2015
20150304177PROCESSOR MANAGEMENT BASED ON APPLICATION PERFORMANCE DATA - Application performance data that indicates a level of service provided in executing one or more applications is determined in software running on one or more processor cores in a computing system that executes the one or more applications. The application performance data is provided to a controller in the computing system that is distinct from the one or more processor cores.10-22-2015
20150302937METHODS AND SYSTEMS FOR MITIGATING MEMORY DRIFT - A memory cell is read by measuring a parameter associated with the memory cell with a first resolution to determine a value stored in the memory cell. The parameter is also measured with a second resolution that is finer than the first resolution. The memory cell is reprogrammed to mitigate an offset between the parameter as measured with the second resolution and the parameter as measured with the first resolution.10-22-2015
20150293854DYNAMIC REMAPPING OF CACHE LINES - A method of managing cache memory includes accessing a cache memory at a primary index that corresponds to an address specified in an access request. A determination is made that accessing the cache memory at the primary index does not result in a cache hit on a cache line with an error-free status. In response to this determination, the primary index is mapped to a secondary index and data for the address is written to a cache line at the secondary index.10-15-2015
20150293845MULTI-LEVEL MEMORY HIERARCHY - Described is a system and method for a multi-level memory hierarchy. Each level is based on different attributes including, for example, power, capacity, bandwidth, reliability, and volatility. In some embodiments, the different levels of the memory hierarchy may use an on-chip stacked dynamic random access memory, (providing fast, high-bandwidth, low-energy access to data) and an off-chip non-volatile random access memory, (providing low-power, high-capacity storage), in order to provide higher-capacity, lower power, and higher-bandwidth performance. The multi-level memory may present a unified interface to a processor so that specific memory hardware and software implementation details are hidden. The multi-level memory enables the illusion of a single-level memory that satisfies multiple conflicting constraints. A comparator receives a memory address from the processor, processes the address and reads from or writes to the appropriate memory level. In some embodiments, the memory architecture is visible to the software stack to optimize memory utilization.10-15-2015
20150293812ERROR-CORRECTION CODING FOR HOT-SWAPPING SEMICONDUCTOR DEVICES - A memory read operation is directed at a group of semiconductor devices from which a first semiconductor device has been removed. An error in data for the memory read operation is detected based on error-correction coding (ECC). The error is caused at least in part by the first semiconductor device having been removed. ECC is used to determine corrected data for the memory read operation.10-15-2015
20150286573SYSTEM AND METHOD OF TESTING PROCESSOR UNITS USING CACHE RESIDENT TESTING - Apparatuses, computer readable mediums, and methods of processor unit testing using cache resident testing are disclosed. The method may include loading a test program in a cache on a chip comprising one or more processor units. The method may include the one or more processor units executing the test program to generate one or more results. The method may include redirecting a first memory reference to the cache, wherein the first memory reference is generated during the execution of the test program. The method may include determining whether the one or more generated results match one or more test results. The method may include redirecting a memory request to a memory location resident in the cache if the memory request includes a memory location not resident in the cache. The method may include redirecting a memory request to the cache if the memory request is not directed to the cache.10-08-2015
20150278016METHOD AND APPARATUS FOR ENCODING ERRONEOUS DATA IN AN ERROR CORRECTION CODE PROTECTED MEMORY - A method and device are described for encoding erroneous data in an error correction code (ECC) protected memory. In one embodiment, incoming data including a plurality of data symbols and a data integrity marker is received. At least one extra symbol is used to mark the incoming data as error-free data or erroneous data (i.e., poison) based on the data integrity marker. ECC may be created to protect the data symbols. The ECC may include a plurality of check symbols, a plurality of unused symbols and the at least one extra symbol. In another embodiment, an error marker may be propagated from a single ECC word to all ECC words of data block (e.g., a cache line, a page, and the like) to prevent errors due to corruption of the error marker caused by faulty memory in the erroneous ECC word.10-01-2015
20150277521DYNAMIC POWER ALLOCATION BASED ON PHY POWER ESTIMATION - A system has a plurality of electronic components including a memory, a PHY coupled to the memory, and one or more other electronic components. Power consumed by the PHY is estimated during operation of the system. Estimating the power consumed by the PHY includes modeling the power consumed by the PHY as a linear function with respect to memory bandwidth. Available power for the PHY is determined based at least in part on the estimated power consumed by the PHY. At least a portion of the available power for the PHY is allocated to at least one of the one or more other components.10-01-2015
20150276868CHIP DEBUG DURING POWER GATING EVENTS - A system, method, and tangible computer readable medium for chip debug is disclosed. For example, the system can include a plurality of functional blocks, a debug path, and a debug bus steering module. The debug path couples the plurality of functional blocks in a daisy chain configuration, where an end functional block from the plurality of functional blocks is at an end of the daisy chain configuration. The debug bus steering module is configured to pass one or more debug signals associated with a first functional block from the plurality of functional blocks along the debug path to the end functional block while a second functional block from the plurality of functional blocks performs one or more power gating cycles.10-01-2015
20150268713ENERGY-AWARE BOOSTING OF PROCESSOR OPERATING POINTS FOR LIMITED DURATION WORKLOADS - The operating point of a processing unit is controlled based on the power consumption (i.e., the rate of energy consumption) associated with a workload, wherein low power consumption may indicate short-duration workloads with idle phases and high power consumption may indicate long, sustained workloads. Energy credits are accumulated while a drain rate of a battery is lower than a threshold drain rate and the energy credits are consumed while the drain rate is higher than the threshold drain rate. The operating point of the processing unit may be increased from a first operating point to a second operating point in response to the energy credits exceeding a first threshold. The operating point of the processing unit may be decreased from the second operating point to the first operating point in response to the energy credits falling below a second threshold.09-24-2015
20150268707USING TEMPERATURE MARGIN TO BALANCE PERFORMANCE WITH POWER ALLOCATION - A method and apparatus using temperature margin to balance performance with power allocation. Nominal, middle and high power levels are determined for compute elements. A set of temperature thresholds are determined that drive the power allocation of the compute elements towards a balanced temperature profile. For a given workload, temperature differentials are determined for each of the compute elements relative the other compute elements, where the temperature differentials correspond to workload utilization of the compute element. If temperature overhead is available, and a compute element is below a temperature threshold, then particular compute elements are allocated power to match or drive toward the balanced temperature profile.09-24-2015
20150261662ADDRESS-PARTITIONED MULTI-CLASS PHYSICAL MEMORY SYSTEM - A multilevel memory system includes a plurality of memories and a processor having a memory controller. The memory controller classifies each memory in accordance with a plurality of memory classes based on its level, its type, or both. The memory controller partitions a unified memory address space into contiguous address blocks and allocates the address blocks among the memory classes. In some implementations, the memory controller then can partition the address blocks assigned to each given memory class into address subblocks and interleave the address subblocks among the memories of the memory class.09-17-2015
20150261472MEMORY INTERFACE SUPPORTING BOTH ECC AND PER-BYTE DATA MASKING - A memory and a method of storing data in a memory are provided. The memory comprises a memory block comprising data bits and additional bits. The memory includes logic which, when receiving a first command, writes data into the data bits of the memory block, wherein the data is masked according to a first input. The logic, in response to a second command, writes data into the data bits of the memory block and writes a second input into the additional bits of the memory block.09-17-2015
20150261457Mechanisms to Save User/Kernel Copy for Cross Device Communications - Central processing units (CPUs) in computing systems manage graphics processing units (GPUs), network processors, security co-processors, and other data heavy devices as buffered peripherals using device drivers. Unfortunately, as a result of large and latency-sensitive data transfers between CPUs and these external devices, and memory partitioned into kernel-access and user-access spaces, these schemes to manage peripherals may introduce latency and memory use inefficiencies. Proposed are schemes to reduce latency and redundant memory copies using virtual to physical page remapping while maintaining user/kernel level access abstractions.09-17-2015
20150241955ADAPTIVE VOLTAGE SCALING - Some embodiments of a processing device include one or more power supply monitors to provide one or more counts representative of one or more operating frequencies of one or more circuit blocks based on a voltage supplied to the circuit block(s). Some embodiments of the processing device also include a system management unit to determine an initial voltage supplied to the circuit block(s) based on a target count and to reduce the voltage supplied to the circuit block(s) from the initial voltage in response to the count(s) generated by the power supply monitor(s) exceeding the target count.08-27-2015
20150234450CONTROL OF PERFORMANCE LEVELS OF DIFFERENT TYPES OF PROCESSORS VIA A USER INTERFACE - An apparatus and a method for controlling power consumption associated with a computing device having first and second processors configured to perform different types of operations includes providing a user interface that allows, during normal operation of the computing device, at least one of: (i) a user selection of desired performance levels of the first and second processors relative to one another, such that higher desired performance levels of one processor correspond to lower desired performance levels of the other processor, and (ii) a user selection of a desired performance level of the first processor and a user selection of a desired performance level of the second processor, the two user selections being made independently of one another. The apparatus and method control, during normal operation of the computing device, performance levels of the processors in response to the one or more user selections of the desired performance levels.08-20-2015
20150227391THERMALLY-AWARE PROCESS SCHEDULING - A scheduler is presented that can adjust, responsive to a thermal condition at the processing device, a scheduling of process threads for compute units of the processing device so as to increase resource contentions between the process threads.08-13-2015
20150227387METHOD AND APPARATUS OF ADAPTIVE APPLICATION PERFORMANCE - A method and apparatus of adaptive application performance includes a determination of at least one criteria for implementing adaptive application performance measures. Based upon the determination, adaptive application performance measures are implemented.08-13-2015
20150222277SELF-ADJUSTING CLOCK DOUBLER AND INTEGRATED CIRCUIT CLOCK DISTRIBUTION SYSTEM USING SAME - In one form, a clock doubler includes a switched inverter, an exclusive logic circuit, and a control signal generation circuit. The switched inverter has first and second control inputs for respectively receiving first and second control signals, a signal input for receiving a clock input signal, and an output. The exclusive logic circuit has a first input for receiving the clock input signal, a second input coupled to the output of the switched inverter, and an output for providing a clock output signal. A control signal generation circuit provides the first and second control signals in response to the clock output signal. The clock doubler may be used in a clock distribution circuit for an integrated circuit that also includes a phase locked loop for providing the input clock signals, and a plurality of clock sub-domains each having one of the clock doublers.08-06-2015
20150221358MEMORY AND MEMORY CONTROLLER FOR HIGH RELIABILITY OPERATION AND METHOD - In one form, a memory includes a memory bank, a page buffer, and an access circuit. The memory bank has a plurality of rows and a plurality of columns with volatile memory cells at intersections of the plurality of row and the plurality of columns. The page buffer is coupled to the plurality of columns and stores contents of a selected one of the plurality of rows. The access circuit is responsive to an adjacent command and a row address to perform a predetermined operation on the row address, and to refresh first and second addresses adjacent to the row address. In another form, a memory controller is adapted to interface with such a memory to select either a normal command or an adjacent command based on a number of activate commands sent to the row in a predetermined time window.08-06-2015
20150215622REGION-BASED IMAGE DECOMPRESSION - A method and a non-transitory computer readable medium for decompressing an image including one or more regions are presented. A region of the image is selected to be decoded. The region and metadata associated with the region are decoded, the metadata including transformation and quantization settings used to compress the region. A reconstruction transformation is applied to the region using the transformation and quantization settings.07-30-2015
20150212737MODE-DEPENDENT ACCESS TO EMBEDDED MEMORY ELEMENTS - A system has a plurality of functional modules including a first functional module and one or more other functional modules. The first functional module includes an embedded memory element and is configurable in a plurality of modes including a first mode and a second mode. When the first functional module is in the first mode, access to the embedded memory element is limited to the first functional module. At least one of the one or more other functional modules is provided with access to the embedded memory element based at least in part on the first functional module being in the second mode.07-30-2015
20150207246CONNECTOR ADAPTOR TO FACILITATE COUPLING OF A MATING CARD EDGE WITH A FEMALE CARD-EDGE CONNECTOR - A connector adaptor facilitates coupling of a mating card edge with a corresponding female card-edge connector disposed on a circuit board. A gathering bevel at the face of the connector adaptor guides the mating card edge from the face of the connector adaptor to the slot at the connection surface of the female card-edge connector, thereby reducing the risk of damage to the female card-edge connector or the mating card edge during a blind mating attempt. In some embodiments the connector adaptor includes at least one connector tab such that the adaptor can be securely inserted into a cage preceding the slot opening of the female card-edge connector. In another embodiment the connector adaptor overlies the female card-edge connector to protect at least the connection surface of the female card-edge connector and can be fastened to the circuit board with any variety of fastening components.07-23-2015
20150206574RELOCATING INFREQUENTLY-ACCESSED DYNAMIC RANDOM ACCESS MEMORY (DRAM) DATA TO NON-VOLATILE STORAGE - A method includes emptying a first region of a dynamic random access memory of data by moving data from the first region to a non-volatile memory and reducing a refresh rate of the dynamic random access memory responsive to emptying the first region of data. A system includes a memory controller to refresh a dynamic random access memory based on a configurable refresh rate, the dynamic random access memory having a plurality of regions, each region having an associated minimum refresh rate, and a processing unit to empty a first region of the plurality of regions of the dynamic random access memory by moving data from the first region to a non-volatile memory and to reduce the configurable refresh rate responsive to emptying the first region.07-23-2015
20150205721Handling Reads Following Transactional Writes during Transactions in a Computing Device - The described embodiments include a computing device that handles cache blocks during a transaction. In the described embodiments, after an entity has written to a cache block in a cache during the transaction, the computing device responds to a read request for the cache block from another entity with a copy of the cache block in a pre-transactional state. In these embodiments, the entity executing the transaction continues the transaction after the computing device responds to the read request from the other entity.07-23-2015
20150205323LOW INSERTION DELAY CLOCK DOUBLER AND INTEGRATED CIRCUIT CLOCK DISTRIBUTION SYSTEM USING SAME - A clock doubler includes a first NAND gate having a first input for receiving a clock input signal and a second input, a second NAND gate having a first input and a second input for receiving a complement of the clock input signal, an output NAND gate having a first and second inputs coupled to outputs of the first and second NAND gates, respectively, and an output for providing a clock output signal, an inverter chain having an input for receiving the clock input signal and responsive to first and second control signals to selectively provide a first true output to the first input of the second NAND gate, and a second complementary output to the second input of the first NAND gate, and a control signal generation circuit providing the first and second control signals in response to the outputs of the first and second NAND gates.07-23-2015
20150199150Performing Logical Operations in a Memory - The described embodiments include a memory with a memory array and logic circuits. In these embodiments, logical operations are performed on data from the memory array by reading the data from the memory array, performing a logical operation on the data in the logic circuits, and writing the data back to the memory array. In these embodiments, the logic circuit is located in the memory so that the data read from the memory array need not be sent to another circuit (e.g., a processor coupled to the memory, etc.) to have the logical operation performed.07-16-2015
20150199126PAGE MIGRATION IN A 3D STACKED HYBRID MEMORY - A die-stacked hybrid memory device implements a first set of one or more memory dies implementing first memory cell circuitry of a first memory architecture type and a second set of one or more memory dies implementing second memory cell circuitry of a second memory architecture type different than the first memory architecture type. The die-stacked hybrid memory device further includes a set of one or more logic dies electrically coupled to the first and second sets of one or more memory dies, the set of one or more logic dies comprising a memory interface and a page migration manager, the memory interface coupleable to a device external to the die-stacked hybrid memory device, and the page migration manager to transfer memory pages between the first set of one or more memory dies and the second set of one or more memory dies.07-16-2015
20150198991PREDICTING POWER MANAGEMENT STATE DURATIONS ON A PER-PROCESS BASIS - Durations of power management states are predicted on a per-process basis. Some embodiments include storing, in one or more data structures associated with one or more processes, information indicating previous durations of a power management state associated with the process(es). Some embodiments also include predicting a subsequent duration of the power management state for the process(es) using information stored in the data structure(s).07-16-2015
20150195943LEVER MECHANISM TO FACILITATE EDGE COUPLING OF CIRCUIT BOARD - A lever mechanism facilitates coupling of a sliding board and a connector. A sliding board is partially enclosed by a sliding board enclosure, such that the sliding board is slidable relative to the enclosure. A pair of pivot levers is disposed between the sliding board and the enclosure. Each pivot lever is connected at a proximal end to the enclosure and at a pivot point to the sliding board. A distal end of the pivot lever is removably engageable with a second circuit board. The lever mechanism translates a pushing force on the enclosure to a pulling force on the sliding board, aligning and eventually coupling the sliding board with the connector.07-09-2015
20150193259BOOSTING THE OPERATING POINT OF A PROCESSING DEVICE FOR NEW USER ACTIVITIES - A processing system detects user activities on one or more processing units. In response, an operating point (operating frequency or an operating voltage) of the processing unit handing the user activity is increased at the processing unit. Battery power may be conserved in some processing systems by limiting the increase in the operating point to a time interval and reducing the operating frequency or the operating voltage to a previous value after the time interval has elapsed.07-09-2015
20150188649METHODS AND SYSTEMS OF SYNCHRONIZER SELECTION - A circuit includes a plurality of synchronizers to adapt a signal from a first clock domain to a second clock domain. Each synchronizer of the plurality of synchronizers includes a synchronizer input to receive the signal from the first clock domain and a synchronizer output to provide the signal as adapted to the second clock domain. The circuit also includes a multiplexer (mux) that includes a plurality of mux inputs and a mux output. Each mux input is coupled to the synchronizer output of a respective synchronizer of the plurality of synchronizers. The mux output provides the signal, as adapted to the second clock domain, from the synchronizer output of a selected synchronizer of the plurality of synchronizers.07-02-2015
20150187437CIRCUIT AND DATA PROCESSOR WITH HEADROOM MONITORING AND METHOD THEREFOR - A circuit with headroom monitoring includes a memory array having memory cells, a replica array, and a built-in self test circuit. The replica array has a plurality of word lines, a plurality of bit line pairs, and memory cells located at intersections of the plurality of word lines and the plurality of bit line pairs. The memory cells are of a same type as memory cells in the memory array. The built-in self test circuit is coupled to the replica array for adding a capacitance to at least one bit line of the plurality of bit line pairs, for sensing a read time of memory cells of the replica array with the capacitance so added, and for providing a headroom signal in response to the read time.07-02-2015
20150186240EXTENSIBLE I/O ACTIVITY LOGS - A method of managing peripherals is performed in a device coupled to a processor in a computer system. In the method, information associated with I/O activity for one or more peripherals is recorded in a first segment of a log. A second segment of the log is identified based on a next-segment pointer associated with the first segment of the log. In response to detecting a lack of available capacity in the first segment of the log, information associated with further I/O activity for the one or more peripherals is recorded in the second segment of the log.07-02-2015
20150186160CONFIGURING PROCESSOR POLICIES BASED ON PREDICTED DURATIONS OF ACTIVE PERFORMANCE STATES - Durations of active performance states of components of a processing system can be predicted based on one or more previous durations of an active state of the components. One or more entities in the processing system such as processor cores or caches can be configured based on the predicted durations of the active state of the components. Some embodiments configure a first component in a processing system based on a predicted duration of an active state of a second component of the processing system. The predicted duration is predicted based on one or more previous durations of an active state of the second component.07-02-2015
20150186075PARTITIONABLE MEMORY INTERFACES - A memory device receives a plurality of read commands and/or write commands in parallel. The memory device transmits data corresponding to respective read commands on respective portions of a data bus and receives data corresponding to respective write commands on respective portions of the data bus. The memory device includes I/O logic to receive the plurality of read commands in parallel, to transmit the data corresponding to the respective read commands on respective portions of the data bus, and to receive the data corresponding to the respective write commands on respective portions of the data bus.07-02-2015
20150185801POWER GATING BASED ON CACHE DIRTINESS - Power gating decisions can be made based on measures of cache dirtiness. Analyzer logic can selectively power gate a component of a processor system based on a cache dirtiness of one or more caches associated with the component. The analyzer logic may power gate the component when the cache dirtiness exceeds a threshold and may maintains the component in an idle state when the cache dirtiness does not exceed the threshold. Idle time prediction logic may be used to predict a duration of an idle time of the component. The analyzer logic may then selectively power gates the component based on the cache dirtiness and the predicted idle time.07-02-2015
20150160975VOLTAGE DROOP MITIGATION IN 3D CHIP SYSTEM - The present invention relates to a multichip system and a method for scheduling threads in 3D stacked chip. The multichip system comprises a plurality of dies stacked vertically and electrically coupled together; each of the plurality of dies comprising one or more cores, each of the plurality of dies further comprising: at least one voltage violation sensing unit, the at least one voltage violation sensing unit being connected with the one or more cores of each die, the at least one voltage sensing unit being configured to independently sense voltage violation in each core of each die; and at least one frequency tuning unit, the at least one frequency tuning unit being configured to tune the frequency of each core of each die, the at least one frequency tuning unit being connected with the at least one voltage violation sensing unit. The multichip system and method described in present invention have many advantages, such as reducing voltage violation, mitigating voltage droop and saving power.06-11-2015
20150121106Dynamic and Adaptive Sleep State Management - An approach is described herein that includes a method for power management of a device. In one example, the method includes sampling duration characteristics for a plurality of past idle events for a predetermined interval of time and determining whether to transition a device to a powered-down state based on the sampled duration characteristics. In another example, the method includes determining whether an average idle time for a plurality of past idle events exceeds an energy break-even point threshold. If the average idle time for the plurality of past idle events exceeds the energy break-even point threshold, a device is immediately transitioned to a powered-down state upon receipt of a next idle event. If the average idle time for the plurality of past idle events does not exceed the energy break-even point threshold, transition of the device to the powered-down state is delayed.04-30-2015
20150121057Using an Idle Duration History to Configure an Idle State of an Entity in a Computing Device - The described embodiments include a computing device with an entity (a processor, a processor core, etc.) and a controller. In these embodiments, the controller, using an idle duration history, predicts a duration of a next idle period for the entity. Based on the predicted duration of the next idle period, the controller configures the entity to operate in a corresponding idle state.04-30-2015
20150121054Platform Secure Boot - A system and method for securing a boot process on the electronic device using a hardware-based secure processor are provided. The hardware-based secure processor receives a boot instruction. In response to the received boot instruction, the hardware-based secure processor authenticates the boot code in hardware while stalling the processor. Once the boot code is authenticated, the processor is released from the stall and processes the boot code.04-30-2015
20150121050BANDWIDTH INCREASE IN BRANCH PREDICTION UNIT AND LEVEL 1 INSTRUCTION CACHE - A processor, a device, and a non-transitory computer readable medium for performing branch prediction in a processor are presented. The processor includes a front end unit. The front end unit includes a level 1 branch target buffer (BTB), a BTB index predictor (BIP), and a level 1 hash perceptron (HP). The BTB is configured to predict a target address. The BIP is configured to generate a prediction based on a program counter and a global history, wherein the prediction includes a speculative partial target address, a global history value, a global history shift value, and a way prediction. The HP is configured to predict whether a branch instruction is taken or not taken.04-30-2015
20150121046ORDERING AND BANDWIDTH IMPROVEMENTS FOR LOAD AND STORE UNIT AND DATA CACHE - The present invention provides a method and apparatus for supporting embodiments of an out-of-order load to load queue structure. One embodiment of the apparatus includes a load queue for storing memory operations adapted to be executed out-of-order with respect to other memory operations. The apparatus also includes a load order queue for cacheable operations that ordered for a particular address.04-30-2015
20150121041PROCESSOR AND METHODS FOR IMMEDIATE HANDLING AND FLAG HANDLING - Described herein are methods and processors for flag renaming in groups to eliminate dependencies of instructions. Decoder and execution units in the processor may be configured to rename flags into groups that allow each group to be treated separately as appropriate. This flag renaming eliminates flag dependencies with respect to instructions. This allows an instruction to write exactly the flags that the instruction wants without having to create merge dependencies. Methods and processors are provided for handling immediate values embedded in instructions. A 16 bit immediate bus and a 4 bit encoding/control bus are added at the interface between decode and execution units. For an 8 or 12 bit immediate, the upper 4 bits of the immediate bus contain the encoding bits. For a 16 bit immediate, the encoding/control bus contains the encoding bits. The encoding/control bus indicates when to look at the top four bits of the immediate bus.04-30-2015
20150121040PROCESSOR AND METHODS FOR FLOATING POINT REGISTER ALIASING - Methods, devices, and systems for accessing packed registers are presented. A state of the packed registers may be tracked and it may be determined whether the register is directly accessible based on the state. If the register is not directly accessible, an action may be performed which allows the register to be accessed directly. The action may include injecting at least one uop for reorganizing the physical storage of the register such that it is directly accessible. The action may include aligning the data with the least significant bit of a physical register or otherwise aligning the data with the datapath. The action may also include changing the state of the packed registers.04-30-2015
20150120978INPUT/OUTPUT MEMORY MAP UNIT AND NORTHBRIDGE - The present invention provides for page table access and dirty bit management in hardware via a new atomic test[0] and OR and Mask. The present invention also provides for a gasket that enables ACE to CCI translations. This gasket further provides request translation between ACE and CCI, deadlock avoidance for victim and probe collision, ARM barrier handling, and power management interactions. The present invention also provides a solution for ARM victim/probe collision handling which deadlocks the unified northbridge. These solutions includes a dedicated writeback virtual channel, probes for IO requests using 4-hop protocol, and a WrBack Reorder Ability in MCT where victims update older requests with data as they pass the requests.04-30-2015
20150120976METHOD AND APPARATUS FOR PERFORMING A BUS LOCK AND TRANSLATION LOOKASIDE BUFFER INVALIDATION - A method and apparatus for performing a bus lock and a translation lookaside buffer invalidate transaction includes receiving, by a lock master, a lock request from a first processor in a system. The lock master sends a quiesce request to all processors in the system, and upon receipt of the quiesce request from the lock master, all processors cease issuing any new transactions and issue a quiesce granted transaction. Upon receipt of the quiesce granted transactions from all processors, the lock master issues a lock granted message that includes an identifier of the first processor. The first processor performs an atomic transaction sequence and sends a first lock release message to the lock master upon completion of the atomic transaction sequence. The lock master sends a second lock release message to all processors upon receiving the first lock release message from the first processor.04-30-2015
20150117582PREDICTIVE PERIODIC SYNCHRONIZATION USING PHASE-LOCKED LOOP DIGITAL RATIO UPDATES - Embodiments are described for a method and system of enabling updates from a clock controller to be sent directly to a predictive synchronizer to manage instant changes in frequency between transmit and receive clock domains, comprising receiving receive and transmit reference frequencies from a phase-locked loop circuit, receiving receive and transmit constant codes from a controller coupled to the phase-locked loop circuit, obtaining a time delay factor to accommodate phase detection between the transmit and receive clock domains, and calculating new detection interval and frequency information using the time delay factor, the reference frequencies, and the constant codes.04-30-2015
20150113268Virtualized AES Computational Engine - A computational engine may include an input configured to receive a first data packet and a second data packet, a context memory configured to store one or more contexts, and a set of computational elements coupled with the input and coupled with the context memory. The set of computational elements may be configured to generate a first output data packet by executing a first sequence of cryptographic operations on the first data packet, and generate a second output data packet by executing a second sequence of cryptographic operations on the second data packet and on a selected context of the one of the one or more contexts. The selected context may be associated with the second packet of data, and the context may be stored in the context memory prior to the execution of the first sequence of cryptographic operations.04-23-2015
20150110267Unified Key Schedule Engine - A key generator may comprise a first set of word registers each configured to store at least one word of a prior key, a set of computational elements coupled with the first set of word registers, one or more path selection elements coupled with the set of computational elements, wherein the one or more path selection elements are configured to select as a selected computational pathway a first computational pathway including a first subset of computational elements when a mode selection signal indicates a first mode, and select as the selected computational pathway a second computational pathway including a second subset of computational elements when the mode selection signal indicates a second mode, and a second set of word registers coupled with the set of computational elements, wherein each of the second set of word registers is configured to store at least one word of a new key generated by the selected computational pathway.04-23-2015
20150110264Virtualized SHA Computational Engine - A computational engine may comprise a working memory configured to receive a first input message and a second input message, a context memory coupled with the working memory, wherein the context memory is configured to simultaneously store a first context corresponding to the first input message and a second context corresponding to the second input message, and a set of computational elements coupled with the working memory and coupled with the context memory, wherein the set of computational elements is configured to finish generating a first output digest based on the first input message and a first context after starting generation of a second output digest based the second input message and a second context and before finishing the generation of the second output digest.04-23-2015
20150109153Efficient Deflate Decompression - A decompression engine may include an input configured to receive an input code comprises one or more bits from a bitstream of encoded data, a symbol decoder coupled with the input, where the symbol decoder is configured to calculate, based on the input code, a plurality of candidate addresses each corresponding to a code group. The symbol decoder may further include a group identifier module coupled with the symbol decoder, wherein the group identifier module is configured to identify one of the plurality of code groups corresponding to the input code, and a multiplexer coupled with the group identifier module, wherein the multiplexer is configured to select as a final address one of the plurality of candidate addresses corresponding to the identified code group.04-23-2015
20150106916LEVERAGING A PERIPHERAL DEVICE TO EXECUTE A MACHINE INSTRUCTION - A method includes executing microcode in a processing unit of a processor to implement a machine instruction, wherein the microcode is to manipulate the processing unit to access a peripheral device on a public communication bus at a private address not visible to other devices on the public communication bus and not specified in the machine instruction. A processor includes a public communication bus, a peripheral device coupled to the public communication bus, and a processing unit. The processing unit is to execute microcode to implement a machine instruction. The microcode is to manipulate the processing unit to access a peripheral device on a public communication bus at a private address not visible to other devices on the public communication bus and not specified in the machine instruction.04-16-2015
20150106642PERFORMANCE STATE BOOST FOR MULTI-CORE INTEGRATED CIRCUIT - An integrated circuit includes a multiple number of processor cores and a system management unit. The multiple number of processor cores each operate at one of a multiple number of performance states. The system management unit is coupled to the multiple number of processor cores, for setting performance states of the multiple number of processor cores. The system management unit boosts a first performance state of a first processor core of the multiple number of processor cores based on both a first temperature calculated from an estimated power consumption, and a second temperature based on a temperature measurement.04-16-2015
20150106604RANDOMLY BRANCHING USING PERFORMANCE COUNTERS - A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. When the processor processes a given instruction of a given instruction type, the processor updates a corresponding performance counter. When the performance counter reaches a threshold, the processor generates an interrupt and compares a location of the given instruction with stored locations in a given list. If a match is not found, then the processor processes an instruction following the given instruction in the computer program without processing intermediate instrumentation code. If a match is found, then the processor processes instrumentation code. Regardless of whether or not the instrumentation code is processed, when control flow returns to the computer program, the corresponding performance counter is initialized with a random value.04-16-2015
20150106602RANDOMLY BRANCHING USING HARDWARE WATCHPOINTS - A system and method for efficiently performing program instrumentation. A processor processes instructions stored in a memory. The processor allocates a memory region for the purpose of creating “random branches” in the computer code utilizing existing memory access instructions. When the processor processes a given instruction, the processor both accesses a first location in the memory region and may determine a condition is satisfied. In response, the processor generates an interrupt. The corresponding interrupt handler may transfer control flow from the computer program to instrumentation code. The condition may include a pointer storing an address pointing to locations within the memory region equals a given address after the point is updated. Alternatively, the condition may include an updated data value stored in a location pointed to by the given address equals a threshold value.04-16-2015
20150106587DATA REMAPPING FOR HETEROGENEOUS PROCESSOR - A processor remaps stored data and the corresponding memory addresses of the data for different processing units of a heterogeneous processor. The processor includes a data remap engine that changes the format of the data (that is, how the data is physically arranged in segments of memory) in response to a transfer of the data from system memory to a local memory hierarchy of an accelerated processing module (APM) of the processor. The APM's local memory hierarchy includes an address remap engine that remaps the memory addresses of the data at the local memory hierarchy so that the data can be accessed by routines at the APM that are unaware of the data remapping. By remapping the data, and the corresponding memory addresses, the APM can perform operations on the data more efficiently.04-16-2015
20150106574Performing Processing Operations for Memory Circuits using a Hierarchical Arrangement of Processing Circuits - The described embodiments include a computing device that comprises at least one memory die having memory circuits and memory die processing circuits, and a logic die coupled to the at least one memory die, the logic die having logic die processing circuits. In the described embodiments, the memory die processing circuits are configured to perform memory die processing operations on data retrieved from or destined for the memory circuits and the logic die processing circuits are configured to perform logic die processing operations on data retrieved from or destined for the memory circuits.04-16-2015
20150102856PROGRAMMABLE BANDGAP REFERENCE VOLTAGE - Embodiments may include a method, system and apparatus for providing a reference voltage supply. A series resistor is provided between a power supply and a bandgap circuit coupled to an amplifier. A shunt transistor circuit is operatively coupled to the series resistor. A programmable output voltage is provided based upon the shunt transistor circuit and a first value of the series resistor.04-16-2015
20150100848DETECTING AND CORRECTING HARD ERRORS IN A MEMORY ARRAY - Hard errors in the memory array can be detected and corrected in real-time using reusable entries in an error status buffer. Data may be rewritten to a portion of a memory array and a register in response to a first error in data read from the portion of the memory array. The rewritten data may then be written from the register to an entry of an error status buffer in response to the rewritten data read from the register differing from the rewritten data read from the portion of the memory array.04-09-2015
20150100818BACK-OFF MECHANISM FOR A PERIPHERAL PAGE REQUEST LOG - A system and method of managing requests from peripherals in a computer system are provided. In the system and method, an input/output memory management unit (IOMMU) receives a peripheral page request (PPR) from a peripheral. In response to a determination that a criterion regarding an available capacity of a PPR log is satisfied, a completion message is sent to the peripheral indicating that the PPR is complete and the PPR is discarded without queuing the PPR in the PPR log.04-09-2015
20150100758DATA PROCESSOR AND METHOD OF LANE REALIGNMENT - A data processor includes a register file divided into at least a first portion and a second portion for storing data. A single instruction, multiple data (SIMD) unit is also divided into at least a first lane and a second lane. The first and second lanes of the SIMD unit correspond respectively to the first and second portions of the register file. Furthermore, each lane of the SIMD unit is capable of data processing. The data processor also includes a realignment element in communication with the register file and the SIMD unit. The realignment element is configured to selectively realign conveyance of data between the first portion of the register file and the first lane of the SIMD unit to the second lane of the SIMD unit.04-09-2015
20150100739Enhancing Lifetime of Non-Volatile Cache by Injecting Random Replacement Policy - A method, a system and a computer-readable medium for writing to a non-volatile cache memory are provided. The method maintains a write count associated with a set of memory locations. The method then selects a cache replacement policy based on the value of the write count and selecting a block within the set for writing data using the selected cache replacement policy. The selected cache replacement policy can introduce a randomized selection.04-09-2015
20150100735Enhancing Lifetime of Non-Volatile Cache by Reducing Intra-Block Write Variation - A method, a system and a computer-readable medium for writing to a cache memory are provided. The method comprises maintaining a write count associated with a set, the set containing a memory block associated with a physical block address. A mapping from a logical address to the physical address of the block is also maintained. The method shifts the mapping based on the value of the write count and writes data to the block based on the mapping.04-09-2015
20150100723DATA PROCESSOR WITH MEMORY CONTROLLER FOR HIGH RELIABILITY OPERATION AND METHOD - A data processor includes a memory accessing agent and a memory controller. The memory accessing agent generates a plurality of accesses to a memory. The memory controller is coupled to the memory accessing agent and schedules the plurality of memory accesses in an order based on characteristics of the memory. The characteristics of the memory include a row cycle page time (t04-09-2015
20150100708METHODS AND SYSTEMS FOR MOVING AND RESIZING I/O ACTIVITY LOGS - A method of managing peripherals is performed in a device coupled to a processor in a computer system. For example, the method is performed in an input/output memory management unit (IOMMU) or a peripheral. The method includes recording information associated with I/O activity for one or more peripherals in a log that has a first base address. The method also includes, without pausing the I/O activity, specifying a second base address for the log and setting a head pointer and a tail pointer for the log to indicate that the log is empty. The second base address is distinct from the first base address.04-09-2015
20150095605Latency-Aware Memory Control - A system, method and computer-readable storage device for accessing heterogeneous memory system, are provided. A memory controller schedules access of a command to a memory region in a set of memory regions based on an access priority associated with the command and where the set of memory regions have corresponding access latencies. The memory controller also defers access of the command to the set of memory regions using at least two queues and the access priority.04-02-2015
20150095586STORING NON-TEMPORAL CACHE DATA - Embodiments herein provide for using one or more cache memory to facilitate non-temporal transaction. A request to store data into a cache associated with a processor is received. In response to receiving the request, a determination is made as to whether the data to be stored is non-temporal data. A predetermined location of the cache is selected; the location to which storing of the non-temporal data is restricted to a predetermined location, in response to determining the data to be stored is non-temporal data. The non-temporal data is data that is not accessed within a predetermined period of time. The non-temporal data is stored into the predetermined location.04-02-2015
20150092856Exploiting Camera Depth Information for Video Encoding - The present disclosure is directed a system and method for exploiting camera and depth information associated with rendered video frames, such as those rendered by a server operating as part of a cloud gaming service, to more efficiently encode the rendered video frames for transmission over a network. The method and system of the present disclosure can be used in a server operating in a cloud gaming service to improve, for example, the amount of latency, downstream bandwidth, and/or computational processing power associated with playing a video game over its service. The method and system of the present disclosure can be further used in other applications where camera and depth information of a rendered or captured video frame is available.04-02-2015
20150089168NESTED CHANNEL ADDRESS INTERLEAVING - A system and method for mapping an address space to a non-power-of-two number of memory channels. Addresses are translated and interleaved to the memory channels such that each memory channel has an equal amount of mapped address space. The address space is partitioned into two regions, and a first translation function is used for memory requests targeting the first region and a second translation function is used for memory requests targeting the second region. The first translation function is based on a first set of address bits and the second translation function is based on a second set of address bits.03-26-2015
20150082093DEBUG APPARATUS AND METHODS FOR DYNAMICALLY SWITCHING POWER DOMAINS - Methods and apparatus are provided that facilitate debugging operations for components in dynamic power domains. In an embodiment, an integrated circuit includes hardware sectors associated with observability circuits served by a debug data bus of a debug circuit. A controlled sector residing in a dynamically-controlled power domain may be turned off while the power domain of another sector remains on. To continue to have debug observability all the way through and after these power events, a debug data register is configured to provide data, such as configuration and/or programming data, to the observability circuit of the controlled sector via the debug data bus. A shadow register is configured to capture the data provided to the controlled sector's observability circuit. The shadow register data is used upon restoring power to the controlled sector to restore the controlled sector's observability circuit to a state when the controlled sector was previously powered on.03-19-2015
20150082092DEBUG APPARATUS AND METHODS FOR DYNAMICALLY SWITCHING POWER DOMAINS - Methods and apparatus are provided that facilitate debugging operations for components that may include different power domains. In an embodiment, an integrated circuit (IC) includes a plurality of hardware sectors, each hardware sector associated with a debug observability circuit that is served by a debug data bus of a debug circuit. The plurality of hardware sectors includes a controlled sector residing in a dynamically-controlled power domain that may be turned off while the power domain of another sector remains on. A selectively switchable data bus component is configured to couple the debug observability circuit associated with the controlled sector to the debug data bus when the power to the controlled sector is on and to switch to bypass the debug observability circuit associated with the controlled sector when the power to the controlled sector is not on.03-19-2015
20150081980METHOD AND APPARATUS FOR STORING A PROCESSOR ARCHITECTURAL STATE IN CACHE MEMORY - A method includes storing architectural state data associated with a processing unit in a cache memory using an allocate without fill mode. A system includes a processing unit, a cache memory, and a cache controller. The cache controller is to receive architectural state data associated with the processing unit and store at least a first portion of the architectural state data in the cache memory using a first fill mode responsive to a first value of a fill mode flag and store at least a second portion of the architectural state data in the cache memory using a second fill mode responsive to a second value of a fill mode flag, wherein the first fill mode differs from the second fill mode with respect to whether previous values of the architectural state data are retrieved prior to storing the first or second portions in the cache memory.03-19-2015
20150074372Apparatus and Method for Hash Table Access - A system and method for accessing a hash table are provided. A hash table includes buckets where each bucket includes multiple chains. When a single instruction multiple data (SIMD) processor receives a group of threads configured to execute a key look-up instruction that accesses an element in the hash table, the threads executing on the SIMD processor identify a bucket that stores a key in the key look-up instruction. Once identified, the threads in the group traverse the multiple chains in the bucket, such that the elements at a chain level in the multiple chains are traversed in parallel. The traversal continues until a key look-up succeeds or fails.03-12-2015
20150074313INTERNAL BUS ARCHITECTURE AND METHOD IN MULTI-PROCESSOR SYSTEMS - An internal bus architecture and method is described. Embodiments include a system with multiple bus endpoints coupled to a bus. In addition, the bus endpoints are directly coupled to each other. Embodiments are usable with known bus protocols.03-12-2015
20150073611ESTIMATING LEAKAGE CURRENTS BASED ON RATES OF TEMPERATURE OVERAGES OR POWER OVERAGES - An operating point of one or more components in a processing device may be set using a leakage current estimated based on at least one of a rate of temperature overages or a rate of power overages. In some embodiments, a power management controller may be used to set an operating point of one or more components in the processing device based on at least one of a rate of temperature overages or a rate of power overages for the component(s).03-12-2015
20150067389Programmable Substitutions for Microcode - The apparatuses, systems, and methods in accordance with the embodiments disclosed herein may facilitate modifying post silicon instruction behavior. Embodiments herein may provide registers in predetermined locations in an integrated circuit. These registers may be mapped to generic instructions, which can modify an operation of the integrated circuit. In some embodiments, these registers may be used to implement a patch routine to change the behavior of at least a portion of the integrated circuit. In this manner, the original design of the integrated circuit may be altered.03-05-2015
20150067357PREDICTION FOR POWER GATING - The present application describes embodiments of methods for tournament prediction of power gating in processing devices. Some embodiments of the method include selecting one of a plurality of predictions of a duration of a time to a power state transition of a component in a processing device. The plurality of predictions are generated using a corresponding plurality of prediction algorithms. Some embodiments of the method also include deciding whether to transition the component from a first power state to a second power state based on the selected prediction.03-05-2015
20150067356POWER MANAGER FOR MULTI-THREADED DATA PROCESSOR - A data processing system includes a plurality of processor resources, a manager, and a power distributor. Each of the plurality of data processor cores is operable at a selected one of a plurality of performance states. The manager assigns each of a plurality of program elements to one of the plurality of processor resources, and synchronizing the program elements using barriers. The power distributor is coupled to the manager and to the plurality of processor resources, and assigns a performance state to each of the plurality of processor resources within an overall power budget, and in response to detecting that a program element assigned to a first processor resource is at a barrier, increases the performance state of a second processor resource that is not at the barrier within the overall power budget.03-05-2015
20150067305SPECIALIZED MEMORY DISAMBIGUATION MECHANISMS FOR DIFFERENT MEMORY READ ACCESS TYPES - A system and method for efficient predicting and processing of memory access dependencies. A computing system includes control logic that marks a detected load instruction as a first type responsive to predicting the load instruction has high locality and is a candidate for store-to-load (STL) data forwarding. The control logic marks the detected load instruction as a second type responsive to predicting the load instruction has low locality and is not a candidate for STL data forwarding. The control logic processes a load instruction marked as the first type as if the load instruction is dependent on an older store operation. The control logic processes a load instruction marked as the second type as if the load instruction is independent on any older store operation.03-05-2015
20150067278Using Redundant Transactions to Verify the Correctness of Program Code Execution - In the described embodiments, a processor core (e.g., a GPU core) receives a section of program code to be executed in a transaction from another entity in a computing device. The processor core sends the section of program code to one or more compute units in the processor core to be executed in a first transaction and concurrently executed in a second transaction, thereby creating a “redundant transaction pair.” When the first transaction and the second transaction are completed, the processor core compares a read-set of the first transaction to a read-set of the second transaction and compares a write-set of the first transaction to a write-set of the second transaction. When the read-sets and the write-sets match and no transactional error condition has occurred, the processor core allows results from the first transaction to be committed to an architectural state of the computing device.03-05-2015
20150067266EARLY WRITE-BACK OF MODIFIED DATA IN A CACHE MEMORY - A level of cache memory receives modified data from a higher level of cache memory. A set of cache lines with an index associated with the modified data is identified. The modified data is stored in the set in a cache line with an eviction priority that is at least as high as an eviction priority, before the modified data is stored, of an unmodified cache line with a highest eviction priority among unmodified cache lines in the set.03-05-2015
20150067264METHOD AND APPARATUS FOR MEMORY MANAGEMENT - In some embodiments, a method of managing cache memory includes identifying a group of cache lines in a cache memory, based on a correlation between the cache lines. The method also includes tracking evictions of cache lines in the group from the cache memory and, in response to a determination that a criterion regarding eviction of cache lines in the group from the cache memory is satisfied, selecting one or more (e.g., all) remaining cache lines in the group for eviction.03-05-2015
20150067209METHOD AND CIRCUIT FOR DETECTING USB 3.0 LFPS SIGNAL - A system and method for efficient detection of Low Frequency Periodic Signaling (LFPS) input signals. A receiver receives two input differential signals that are LFPS input signals. The receiver increases the common-mode voltage for each of the two input differential signals and determines two polarity opposite differences between the level shifted intermediate differential signals. The differences are used to generate two series of relatively narrow pulses by comparisons with a given threshold. A wide continuous pulse is asserted when an initial pulse among the two series of pulses is detected. The wide continuous pulse is deasserted when a final pulse among the two series of pulses is detected. While the wide continuous pulse is asserted, control logic is awakend and performs a Universal Serial Bus (USB) protocol for processing data on the input differential signals.03-05-2015
20150066872Efficient Duplicate Elimination - Methods and systems for identifying unique values in an input list are provided. A method in a processor is provided. The method includes generating a hash value list based on an input list of items using a respective work hem of the processor for each item in the input list and identify lag unique items in the input list based at least on the hash value list.03-05-2015
20150061737PHASE LOCKED LOOP WITH BANDWIDTH CONTROL - A phase locked loop (PLL) includes a first loop, a second loop, and a lock detector. The first loop locks a feedback signal having a frequency equal to a fraction of a frequency of an output signal to a reference signal in phase. The first loop has a first bandwidth. The second loop locks the feedback signal to the reference signal in frequency and has a second bandwidth. The first bandwidth is higher than the second bandwidth. The lock detector is coupled to the second loop and increases the second bandwidth in response to detecting that the feedback signal is not locked to the reference signal.03-05-2015
20150058575PRECHARGE DISABLE USING PREDECODED ADDRESS - A memory can be a sum addressed memory (SAM) that receives, for each read access, two address values (e.g. a base address and an offset) having a sum that indicates the entry of the memory to be read (the read entry). A decoder adds the two address value to identify the read entry. Concurrently, a predecode module predecodes the two address values to identify a set of entries (e.g. two different entries) at the memory, whereby the set includes the entry to be read. The predecode module generates a precharge disable signal to terminate precharging at the set of entries which includes the entry to he read. Because the precharge disable signal is based on predecoded address information, it can be generated without waiting for a full decode of the read address entry.02-26-2015
20150058567HIERARCHICAL WRITE-COMBINING CACHE COHERENCE - A method, computer program product, and system is described that enforces a release consistency with special accesses sequentially consistent (RCsc) memory model and executes release synchronization instructions such as a StRel event without tracking an outstanding store event through a memory hierarchy, while efficiently using bandwidth resources. What is also described is the decoupling of a store event from an ordering of the store event with respect to a RCsc memory model. The description also includes a set of hierarchical read-only cache and write-only combining buffers that coalesce stores from different parts of the system. In addition, a pool component maintains partial order of received store events and release synchronization events to avoid content addressable memory (CAM) structures, full cache flushes, as well as direct write-throughs to memory. The approach improves the performance of both global and local synchronization events and reduces overhead in maintaining write-only combining buffers.02-26-2015
20150046652WRITE COMBINING CACHE MICROARCHITECTURE FOR SYNCHRONIZATION EVENTS - A method, computer program product, and system is described that enforces a release consistency with special accesses sequentially consistent (RCsc) memory model and executes release synchronization instructions such as a StRel event without tracking an outstanding store event through a memory hierarchy, while efficiently using bandwidth resources. What is also described is the decoupling of a store event from an ordering of the store event with respect to a RCsc memory model. The description also includes a set of hierarchical read/write combining buffers that coalesce stores from different parts of the system. In addition, a pool component maintains partial order of received store events and release synchronization events to avoid content addressable memory (CAM) structures, full cache flushes, as well as direct write-throughs to memory. The approach improves the performance of both global and local synchronization events since a store event may not need to reach main memory to complete.02-12-2015
20150039836METHODS AND APPARATUS RELATED TO DATA PROCESSORS AND CACHES INCORPORATED IN DATA PROCESSORS - A cache includes a cache array and a cache controller. The cache array has a multiple number of entries. The cache controller is coupled to the cache array, for storing new entries in the cache array in response to accesses by a data processor, and evicts entries from the cache array according to a cache replacement policy. The cache controller includes a frequent writes predictor for storing frequency information indicating a write back frequency for the multiple number of entries. The cache controller selects a candidate entry for eviction based on both recency information and the frequency information.02-05-2015
20150039833Management of caches - A system and method for efficiently powering down banks in a cache memory for reducing power consumption. A computing system includes a cache array and a corresponding cache controller. The cache array includes multiple banks, each comprising multiple cache sets. In response to a request to power down a first bank of the multiple banks in the cache array, the cache controller selects a cache line of a given type in the first bank and determines whether a respective locality of reference for the selected cache line exceeds a threshold. If the threshold is exceeded, then the selected cache line is migrated to a second bank in the cache array. If the threshold is not exceeded, then the selected cache line is written back to lower-level memory.02-05-2015
20150036681PASS-THROUGH ROUTING AT INPUT/OUTPUT NODES OF A CLUSTER SERVER - Node locations in the topology of a cluster computer server are designated as input/output (I/O) nodes that provide input and output for the cluster computer server. Examples of I/O nodes include network nodes that provide an interface for the cluster computer server to an external network, and storage nodes that provide access to storage devices for the cluster compute server. The I/O nodes are configured to analyze received messages and identify whether the message is targeted to the receiving I/O node or to another node of the cluster compute server. Those messages targeted to the I/O node are provided to a processing module of the I/O node for processing.02-05-2015
20150029799CANARY CIRCUIT WITH PASSGATE TRANSISTOR VARIATION - A canary circuit with passgate transistor variation is described herein. The canary circuit includes a memory canary circuit that has a plurality of bitcells. Each bitcell has at least a passgate transistor that is driven by a wordline voltage. The canary circuit further includes a regulator circuit that outputs a wordline voltage that accounts for a predetermined offset of a threshold voltage of the passgate transistor. In an embodiment, the regulator circuit is a subtractor circuit that generates the wordline voltage from a reference voltage based in part on the threshold voltage variation of the passgate transistor.01-29-2015
20150026686DEPENDENT INSTRUCTION SUPPRESSION IN A LOAD-OPERATION INSTRUCTION - A method includes suppressing execution of an operation portion of a load-operation instruction in a processor responsive to an invalid status of a load portion of load-operation instruction. A processor includes an instruction pipeline including an execution unit operable to execute instructions and a scheduler unit. The scheduler unit includes a scheduler queue and is operable to store a load-operation in the scheduler queue. The load-operation instruction includes a load portion and an operation portion. The scheduler unit schedules the load portion for execution in the execution unit, marks the operation portion in the scheduler queue as eligible for execution responsive to scheduling the load portion, receives an indication of an invalid status of the load portion, and suppresses execution of the operation portion responsive to the indication of the invalid status.01-22-2015
20150026685DEPENDENT INSTRUCTION SUPPRESSION - A method includes suppressing execution of at least one dependent instruction of a first instruction by a processor responsive to an invalid status of an ancestor load instruction associated with the first instruction. A processor includes an instruction pipeline having an execution unit to execute instructions, a load store unit for retrieving data from a memory hierarchy, and a scheduler unit. The scheduler unit selects for execution in the execution unit a first load instruction having at least one dependent instruction linked to the first load instruction for data forwarding from the load store unit and suppresses execution of a second dependent instruction of the first dependent instruction responsive to an invalid status of the first load instruction.01-22-2015
20150026532METHOD AND APPARATUS FOR PROVIDING CLOCK SIGNALS FOR A SCAN CHAIN - An integrated circuit device includes a plurality of flip flops configured into a scan chain. The plurality of flip flops includes at least flip flop of a first type and at least one flip flop of a second type. A method includes generating a first scan clock signal for loading scan data into at least one flip flop of a first type, generating a second scan clock signal and a third scan clock signal for loading the scan data into at least one flip flop of a second type, and loading a test pattern into a scan chain defined by the at least flip flop of the first type and the at least one flip flop of the second type responsive to the first, second, and third scan clock signals.01-22-2015
20150026511PARTITIONABLE DATA BUS - A method and a system are provided for partitioning a system data bus. The method can include partitioning off a portion of a system data bus that includes one or more faulty bits to form a partitioned data bus. Further, the method includes transferring data over the partitioned data bus to compensate for data loss due to the one or more faulty bits in the system data bus.01-22-2015
20150026436HYBRID TAG SCHEDULER - The present invention provides a method and apparatus for scheduling based on tags of different types. Some embodiments of the method include broadcasting a first tag to entries in a queue of a scheduler. The first tag is broadcast in response to a first instruction associated with a first entry in the queue being picked for execution. The first tag includes information identifying the first entry and information indicating a type of the first tag. Some embodiments of the method also include marking at least one second entry in the queue is ready to be picked for execution in response to at least one second tag associated with at least one second entry in the queue matching the first tag.01-22-2015
20150026414STRIDE PREFETCHING ACROSS MEMORY PAGES - A prefetcher maintains the state of stored prefetch information, such as a prefetch confidence level, when a prefetch would cross a memory page boundary. The maintained prefetch information can be used both to identify whether the stride pattern for a particular sequence of demand requests persists after the memory page boundary has been crossed, and to continue to issue prefetch requests according to the identified pattern. The prefetcher therefore does not have re-identify a stride pattern each time a page boundary is crossed by a sequence of demand requests, thereby improving the efficiency and accuracy of the prefetcher.01-22-2015
20150026407SIZE ADJUSTING CACHES BASED ON PROCESSOR POWER MODE - As a processor enters selected low-power modes, a cache is flushed of data by writing data stored at the cache to other levels of a memory hierarchy. The flushing of the cache allows the size of the cache to be reduced without suffering an additional performance penalty of writing the data at the reduced cache locations to the memory hierarchy. Accordingly, when the cache exits the selected low-power modes, it is sized to a minimum size by setting the number of ways of the cache to a minimum number. In response to defined events at the processing system, a cache controller changes the number of ways of each set of the cache.01-22-2015
20150026406SIZE ADJUSTING CACHES BY WAY - A size of a cache of a processing system is adjusted by ways, such that each set of the cache has the same number of ways. The cache is a set-associative cache, whereby each set includes a number of ways. In response to defined events at the processing system, a cache controller changes the number of ways of each set of the cache. For example, in response to a processor core indicating that it is entering a period of reduced activity, the cache controller can reduce the number of ways available in each set of the cache.01-22-2015
20150022218TECHNIQUES AND CIRCUITS FOR TESTING A VIRTUAL POWER SUPPLY AT AN INTEGRATED CIRCUIT DEVICE - A power grid provides power to one or more modules of an integrated circuit device via a virtual power supply signal. A test module is configured to respond to assertion of a test signal so that, when the power grid is working properly and is not power gated, an output of the test module matches the virtual power supply. When the power grid is not working properly, the output of the test module is a fixed logic signal that does not vary based on the power gated state of the one or more modules.01-22-2015
20140375634HYBRID CLIENT-SERVER RENDERING WITH LOW LATENCY IN VIEW - A method, system and computer-readable medium for rendering images are provided. The method includes rendering a first image based on a model. The method further includes receiving additional image information rendered by the server and incorporating the additional image information into the first image to create a second image for display. The second image is displayed in an output device.12-25-2014
20140359386SCAN OR JTAG CONTROLLABLE CAPTURE CLOCK GENERATION - A capture clock generation control mechanism is provided. The capture clock generation control mechanism controls the number of at-speed clocks generated and supplied to one or more scan chains during scan testing of a microcircuit based on control data stored in a JTAG or scan test register. The scan test register may be formed out of scan cells and comprise part of a scan chain. Automatic Test Pattern Generation (ATPG) tools may generate the data that is loaded into the scan test register to automatically configure the clock generation control mechanism. The clock control mechanism may include the ability to adjust the position of the at-speed clocks within a capture cycle, thereby facilitating transition fault detection.12-04-2014
20140359250TYPE INFERENCE FOR INFERRING SCALAR/VECTOR COMPONENTS - Methods and systems are provided for inferring types in a computer program. In one example, a method comprises: identifying a type of at least one expression of the computer program; and annotating the at least one expression in the computer program when the type of the at least one expression is at least one of a varying type and a uniform type.12-04-2014
20140351562TECHNIQUES FOR SCHEDULING OPERATIONS AT AN INSTRUCTION PIPELINE - A dispatch stage of a processor core dispatches designated operations (e.g. load/store operations) to a temporary queue when the resources to execute the designated operations are not available. Once the resources become available to execute an operation at the temporary queue, the operation is transferred to a scheduler queue where it can be picked for execution. By dispatching the designated operations to the temporary queue, other operations behind the designated operations in a program order are made available for dispatch to the scheduler queue, thereby improving instruction throughput at the processor core.11-27-2014
20140344947METHOD AND APPARATUS FOR HANDLING STORAGE OF CONTEXT INFORMATION - A method and apparatus is provided for improving security of context information of processing circuitry of a processing device. In one example, the method and apparatus stores context information of the processing circuitry on an external storage medium at a first location as part of the processing circuitry entering a first power state, and stores the context information of the processing circuitry on the storage medium at a second location as part of the processing circuitry entering a second, later and different power state.11-20-2014
20140344919DEBUG FUNCTIONALITY IN A SECURE COMPUTING ENVIRONMENT - A computer system includes a security processor, a first scan chain coupled to the security processor, a non-secure element, and a second scan chain coupled to the non-secure element. The computer system also includes one or more test access port controllers to control operation of the first and second scan chains, and further includes debug control logic, coupled to the one or more test access port controllers, to enable the one or more test access port controllers to activate debug functionality on the second scan chain but not the first scan chain in response to a predefined condition being satisfied.11-20-2014
20140344830EFFICIENT PROCESSOR LOAD BALANCING USING PREDICATION FLAGS - A system and methods embodying some aspects of the present embodiments for efficient load balancing using predication flags are provided. The load balancing system includes a first processing unit, a second processing unit, and a shared queue. The first processing unit is in communication with a first queue. The second processing unit is in communication with a second queue. The first and second queues are each configured to hold a packet. The shared queue is configured to maintain a work assignment, wherein the work assignment is to be processed by either the first or second processing unit.11-20-2014
20140344826Architecture for Efficient Computation of Heterogeneous Workloads - Embodiments of a workload management architecture may include an input configured to receive workload data for a plurality of commands, a DMA block configured to divide the workload data for each command of the plurality of commands into a plurality of job packets, a job packet manager configured to assign one of the plurality of job packets to one of a plurality of fixed function engines (FFEs) coupled with the job packet manager, where each of the plurality of FFEs is configured to receive one or more of the plurality of job packets and generate one or more output packets based on the workload data in the received one or more job packets.11-20-2014
20140344599Method and System for Power Management - Embodiments described herein include a method for power management. In an embodiment, the method includes responsive to a determination that an idle time has exceeded a threshold, transitioning a device to an intermediate power state in which a predetermined processing module of the device is powered down, the idle time being a time since a last wakeup event, and determining whether to transition the device from the intermediate power state to a substantially powered down state.11-20-2014
20140344592METHODS AND APPARATUS FOR POWERING UP AN INTEGRATED CIRCUIT - A power supply (11-20-2014
20140344555Scalable Partial Vectorization - A system, method and computer program product to compute latencies of a plurality of expression trees in a basic block and to select a first and a second expression tree from the plurality of expression trees based on the computed latencies. The first expression tree is isomorphic to the second expression tree and the first and second expression trees are selected in order of largest to smallest latency. This selection ensures that the largest isomorphic expression trees are vectorized first. By vectorizing the largest isomorphic expression trees first, a basic block containing hundreds of statements can be vectorized without significant compile time. Moreover, vectorization of the largest isomorphic expression trees results in a significant improvement in system performance on SIMD processors.11-20-2014
20140344489VARIABLE-SIZED BUFFERS MAPPED TO HARDWARE REGISTERS - An interface includes a first hardware register field to store respective chunks of a command directed to a device and respective chunks of a response to the command from the device. The interface also includes a second hardware register field to store a size of the command and a size of the response. The first and second hardware register fields are accessible by the device and by a processor external to the device that generates the command, in response to memory not being available to buffer the command and the response.11-20-2014
20140344486METHODS AND APPARATUS FOR STORING AND DELIVERING COMPRESSED DATA - Methods and apparatus for storing and delivering compressed data are disclosed. In one embodiment, a direct memory access (DMA) unit with a lossless coder/decoder (CODEC) receives uncompressed data. The direct memory access unit then compresses the uncompressed data to produce lossless compressed data, and stores the lossless compressed data in a memory, wherein the compressing operation and the storing operation are each part of a direct memory access (DMA) write operation. In another embodiment, the direct memory access (DMA) unit receives lossless compressed data. The direct memory access unit then decompresses the compressed data to produce lossless decompressed data, and delivers the decompressed data to an output device, wherein the decompressing operation and the receiving operation are each part of a direct memory access (DMA) read operation.11-20-2014
20140340527METHODS AND APPARATUS FOR STORING AND DELIVERING COMPRESSED DATA - A video device having data lanes and a method of operating the video device includes generating performance monitoring and/or debug data in response to the operation of the video device. The generated data is sampled from component of the video device operating in various clocking domain. The data sampled from the components is combined into a unified stream which is independent of the various clocking domain. The unified stream is transmitted across one more data lanes of a video link along with corresponding audio and/or video data in real time.11-20-2014
20140340246Efficient Processing of Huffman Encoded Data - A method of decoding Huffman-encoded data may comprise receiving a symbol associated with the Huffman encoded data, selecting a target group for the symbol based on a bit length value associated with the symbol, associating the symbol with the target group, associating the symbol with a code, and incrementing a starting code for each of a plurality of groups associated with a starting code that is equal to or greater than the starting code of the target group.11-20-2014
20140340114FAULT DETECTION FOR A DISTRIBUTED SIGNAL LINE - An integrated circuit device includes a first signal line for distributing a first signal. The first signal line includes a plurality of branch lines, and a leaf node is defined at an end of each branch line. First logic is coupled to the leaf nodes and operable to generate a first status signal indicative of a collective first logic state of the leaf nodes of the signal line corresponding to the first signal.11-20-2014
20140337587METHOD FOR MEMORY CONSISTENCY AMONG HETEROGENEOUS COMPUTER COMPONENTS - A method, computer program product, and system is described that determines the correctness of using memory operations in a computing device with heterogeneous computer components. Embodiments include an optimizer based on the characteristics of a Sequential Consistency for Heterogeneous-Race-Free (SC for HRF) model that analyzes a program and determines the correctness of the ordering of events in the program. HRF models include combinations of the properties: scope order, scope inclusion, and scope transitivity. The optimizer can determine when a program is heterogeneous-race-free in accordance with an SC for HRF memory consistency model . For example, the optimizer can analyze a portion of program code, respect the properties of the SC for HRF model, and determine whether a value produced by a store memory event will be a candidate for a value observed by a load memory event. In addition, the optimizer can determine whether reordering of events is possible.11-13-2014
20140337496Embedded Management Controller for High-Density Servers - The described embodiments comprise an embedded management controller for managing a server on a sled device. The embedded management controller on the sled device comprises a processing mechanism; a plurality of internal interfaces coupled to the processing mechanism, each internal interface coupled to a corresponding interface of the server; and an external interface coupled to the processing mechanism, the external interface configured to be coupled to a system management controller separate from the sled device. In these embodiments, the processing mechanism is configured to communicate with the server using the internal interfaces, and is configured to selectively communicate information based on communications with the server to the system management controller using the external interface and commands to the server based on communications received from the system management controller using the external interface.11-13-2014
20140334084SERVER SYSTEM WITH INTERLOCKING CELLS - A server system includes an array of server cells. Some or all of the server cells include a set of at least three side panels forming an enclosure and a compute component comprising a processor core. At least one side panel of each server cell is removably mechanically coupled and removably electrically coupled to a facing side panel of an adjacent server cell. The enclosure may form a triangular prism enclosure, a cuboid enclosure, a hexagonal prism enclosure, etc. The enclosure can be formed from a rigid flex printed circuit board (PCB) assembly, whereby the side panels are implemented as rigid PCB sections that are interconnected via flexible PCB sections, with the flexible PCB sections forming corners between the rigid PCB sections when the rigid-flex PCB assembly is folded into the enclosure shape. The compute component and other circuit components are disposed at the interior surfaces of the rigid PCB sections.11-13-2014
20140333638POWER-EFFICIENT NESTED MAP-REDUCE EXECUTION ON A CLOUD OF HETEROGENEOUS ACCELERATED PROCESSING UNITS - An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.11-13-2014
20140333621IMPLICIT TEXTURE MAP PARAMETERIZATION FOR GPU RENDERING - Embodiments are described for a method for processing textures for a mesh comprising quadrilateral polygons for real-time rendering of an object or model in a graphics processing unit (GPU), comprising associating an independent texture map with each face of the mesh to produce a plurality of face textures, packing the plurality of face textures into a single texture atlas, wherein the atlas is divided into a plurality of blocks based on a resolution of the face textures, adding a border to the texture map for each face comprising additional texels including at least border texels from an adjacent face texture map, and performing linear interpolation of a trilinear filtering operation on the face textures to resolve resolution discrepancies caused when crossing an edge of a polygon.11-13-2014
20140331230Remote Task Queuing by Networked Computing Devices - The described embodiments include a networking subsystem in a second computing device that is configured to receive a task message from a first computing device. Based on the task message, the networking subsystem updates an entry in a task queue with task information from the task message. A processing subsystem in the second computing device subsequently retrieves the task information from the task queue and performs the corresponding task. In these embodiments, the networking subsystem processes the task message (e.g., stores the task information in the task queue) without causing the processing subsystem to perform operations for processing the task message.11-06-2014
20140331207Determining the Vulnerability of Multi-Threaded Program Code to Soft Errors - The described embodiments include a program code testing system that determines the vulnerability of multi-threaded program code to soft errors. For multi-threaded program code, two to more threads from the program code may access shared architectural structures while the program code is being executed. The program code testing system determines accesses of architectural structures made by the two or more threads of the multi-threaded program code and uses the determined accesses to determine a time for which the program code is exposed to soft errors. From this time, the program code testing system determines a vulnerability of the program code to soft errors.11-06-2014
20140331069POWER MANAGEMENT FOR MULTIPLE COMPUTE UNITS - An interface couples a plurality of compute units to a power management controller. The interface conveys a power report for the plurality of compute units to the power management controller. The power management controller receives the power report, determines a power action for the plurality of compute units based at least in part on the power report, and transmits a message specifying the power action through the interface. The power action is performed.11-06-2014
20140328112MEMORY CELL SUPPLY VOLTAGE REDUCTION PRIOR TO WRITE CYCLE - An integrated circuit device includes a memory cell coupled to a supply voltage line to receive a supply voltage and a voltage control circuit operable to reduce a magnitude of the supply voltage prior to a write cycle to the memory cell. The voltage control circuit includes a first capacitor that is selectively coupled between a supply voltage line and a first reference supply voltage line of the integrated circuit device in anticipation of a write cycle to the memory cell.11-06-2014
20140327696VARIABLE ACUITY RENDERING USING MULTISAMPLE ANTI-ALIASING - Embodiments are described for a method for using anti-aliasing hardware to generate a higher resolution image at the processing of a lower resolution image with anti-aliasing. A graphics image comprising allocating a buffer used in a multisample anti-aliasing process, wherein the allocated buffer has a dimension comprising a reduction in at least one of the width or height of an original dimension of an original buffer provided by the anti-aliasing hardware; rendering sampled image data to the allocated buffer at a sampling rate proportional to the reduction; and expanding the allocated buffer back to the dimension of the original buffer.11-06-2014
20140327477PHASE LOCKED LOOP SYSTEM WITH BANDWIDTH MEASUREMENT AND CALIBRATION - A phase locked loop (PLL) system includes a PLL and a calibration circuit. The PLL has a reference clock input, a voltage controlled oscillator (VCO) clock output, and a feedback clock output. The calibration circuit provides a reference clock signal to the reference clock input of the PLL, induces first and second phase disturbances between the reference clock signal and a feedback clock signal, measures respective first and second zero crossing times of a phase error between the reference clock signal and the feedback clock signal, and estimates a bandwidth of the PLL in response to an average of the first and second zero crossing times.11-06-2014
20140325187SINGLE-CYCLE INSTRUCTION PIPELINE SCHEDULING - A method includes allocating a first single-cycle instruction to a first pipeline that picks single-cycle instructions for execution in program order. The method further includes marking at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to all older single-cycle instructions allocated to the first pipeline being ready and eligible to be picked for execution. An apparatus includes a decoder to decode a first single-cycle instruction and to allocate the first single-cycle instruction to a first pipeline. The apparatus further includes a scheduler to pick single-cycle instructions for execution by the first pipeline in program order and to mark at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to determining that all older single-cycle instructions allocated to the first pipeline are ready and eligible.10-30-2014
20140325135TERMINATION IMPEDANCE APPARATUS WITH CALIBRATION CIRCUIT AND METHOD THEREFOR - A termination impedance apparatus includes a variable pull-up resistor, a variable pull-down resistor, and a small-signal calibration circuit. The variable pull-up resistor is coupled between a first power supply voltage terminal and an output terminal. The variable pull-down resistor is coupled between the output terminal and a second power supply voltage terminal. The small-signal calibration circuit is for calibrating the variable pull-up resistor and the variable pull-down resistor to achieve a desired small-signal impedance.10-30-2014
20140325105MEMORY SYSTEM COMPONENTS FOR SPLIT CHANNEL ARCHITECTURE - In one form, a memory module includes a first plurality of memory devices comprising a first rank and having a first group and a second group, and first and second chip select conductors. The first chip select conductor interconnects chip select input terminals of each memory device of the first group, and the second chip select conductor interconnects chip select input terminals of each memory device of the second group. In another form, a system includes a memory controller that performs a first burst access using both first and second portions of a data bus and first and second chip select signals in response to a first access request, and a second burst access using a selected one of the first and second portions of the data bus and a corresponding one of the first and second chip select signals in response to a second access request.10-30-2014
20140317357PROMOTING TRANSACTIONS HITTING CRITICAL BEAT OF CACHE LINE LOAD REQUESTS - A processor includes a cache memory, a first core including an instruction execution unit, and a memory bus coupling the cache memory to the first core. The memory bus is operable to receive a first portion of a cache line of data for the cache memory, the first core is operable to identify a plurality of data requests targeting the cache line and the first portion and select one of the identified plurality of data requests for execution, and the memory bus is operable to forward the first portion to the instruction execution unit and to the cache memory in parallel.10-23-2014
20140317356MERGING DEMAND LOAD REQUESTS WITH PREFETCH LOAD REQUESTS - A processor includes a processing unit, a cache memory, and a central request queue. The central request queue is operable to receive a prefetch load request for a cache line to be loaded into the cache memory, receive a demand load request for the cache line from the processing unit, merge the prefetch load request and the demand load request to generate a promoted load request specifying the processing unit as a requestor, receive the cache line associated with the promoted load request, and forward the cache line to the processing unit.10-23-2014
20140317316MINIMIZING LATENCY FROM PERIPHERAL DEVICES TO COMPUTE ENGINES - Methods, systems, and computer program products are provided for minimizing latency in a implementation where a peripheral device is used as a capture device and a compute device such as a GPU processes the captured data in a computing environment. In embodiments, a peripheral device and GPU are tightly integrated and communicate at a hardware/firmware level. Peripheral device firmware can determine and store compute instructions specifically for the GPU, in a command queue. The compute instructions in the command queue are understood and consumed by firmware of the GPU. The compute instructions include but are not limited to generating low latency visual feedback for presentation to a display screen, and detecting the presence of gestures to be converted to OS messages that can be utilized by any application.10-23-2014
20140317267High-Density Server Management Controller - The described embodiments include a system management controller for managing servers on a plurality of sled devices. The system management controller includes a processing mechanism and a plurality of internal interfaces coupled to the processing mechanism. Each internal interface in the system management controller is coupled to at least one embedded management controller on each of the sled devices, the at least one embedded management controller on each sled device facilitating communications between the processing mechanism in the system management controller and a corresponding server on the sled device. In these embodiments, the processing mechanism in the system management controller is configured to manage one or more operations of the servers on the plurality of sled devices.10-23-2014
20140310552REDUCED-POWER SLEEP STATE S3 - Current computer systems support sleep states such as sleep state S10-16-2014
20140310541Power Gating Shared Logic - We report methods, integrated circuit devices, and fabrication processes relating to power management transitions of multiple compute units sharing a resource. One method include, in response to an indication that a first compute unit of a plurality of compute units is attempting to enter a normal power state and in response to no other compute units being in a low power state, causing a resource to enter the normal power state, wherein the plurality of compute units share the resource; and causing the first compute unit to enter the normal power state.10-16-2014
20140310506ALLOCATING STORE QUEUE ENTRIES TO STORE INSTRUCTIONS FOR EARLY STORE-TO-LOAD FORWARDING - The present invention provides a method and apparatus for allocating store queue entries to store instructions for early store-to-load forwarding. Some embodiments of the method include allocating an entry in a store queue to a store instruction in response to the store instruction being dispatched and prior to receiving a translation of a virtual address to a physical address associated with the store instruction. The entry includes storage for data to be written to the physical address by the store instruction.10-16-2014
20140310500PAGE CROSS MISALIGN BUFFER - The present application describes embodiments of a method and apparatus including a page cross misalign buffer. Some embodiments of the apparatus include a store queue for a plurality of entries configured to store information associated with store instructions. A respective entry in the store queue can store a first portion of information associated with a page crossing store instruction. Some embodiments of the apparatus also include one or more buffers configured to store a second portion of information associated with the page crossing store instruction.10-16-2014
20140306746DYNAMIC CLOCK SKEW CONTROL - A device includes a dock generator operable to generate a clock signal. A first module includes a first clock network coupled to the clock generator for distributing the clock signal. A second module includes a second clock network coupled to the clock generator for distributing the clock signal. A clock skew control circuit is operable to receive a first instance of the clock signal from the first clock network and a second instance of the clock signal from the second clock network and to control skew between the first and second instances of the clock signal.10-16-2014
20140304474Conditional Notification Mechanism - The described embodiments comprise a computing device with a first processor core and a second processor core. In some embodiments, during operations, the first processor core receives, from the second processor core, an indication of a memory location and a flag. The first processor core then stores the flag in a first cache line in a cache in the first processor core and stores the indication of the memory location separately in a second cache line in the cache. Upon encountering a predetermined result when evaluating a condition for the indicated memory location, the first processor core updates the flag in the first cache line. Based on the update of the flag, the first processor core causes the second processor core to perform an operation.10-09-2014
20140300395SAFE RESET CONFIGURATION OF FUSES AND FLOPS - Methods, apparatus, and fabrication techniques relating to improved propagation of fuse data through an integrated circuit device during scan shift reset. In some embodiments, the methods comprise loading a first value of at least one fuse bit to an integrated circuit device, during a time period when a clock signal having a first frequency is provided to at least one component of the integrated circuit device; disabling a scan shift after the loading of the first value; inactivating the clock signal after the loading of the first value; propagating the first value of the at least one fuse bit to the at least one component of the integrated circuit device; and reactivating the clock signal after the propagation of the first value.10-09-2014
20140298068DISTRIBUTION OF POWER GATING CONTROLS FOR HIERARCHICAL POWER DOMAINS - An integrated circuit device includes a first module disposed within a first power domain, a second module disposed in a second power domain that is a sub-domain of the first power domain, first power gating logic, and second power gating logic. The first power gating logic generates a first virtual power supply for the first module. The second power gating logic is powered by the first virtual power supply for generating a second virtual power supply for the second power domain.10-02-2014
20140297996MULTIPLE HASH TABLE INDEXING - A processor includes storage elements to store a first and second value, as well as a plurality of hash units coupled to the storage elements. Each hash unit performs a hash operation using the first value and the second value to generate a corresponding hash result value. The processor further includes selection logic to select a hash result value from the hash result values generated by the plurality of hash units responsive to a selection input generated from another hash operation performed using the first value and the second value. A method includes predicting whether a branch instruction is taken based on a prediction value stored at an entry of a branch prediction table indexed by an index value selected from a plurality of values concurrently generated from an address value of the branch instruction and a branch history value representing a history of branch directions at the processor.10-02-2014
20140297965CACHE ACCESS ARBITRATION FOR PREFETCH REQUESTS - A processor employs a prefetch prediction module that predicts, for each prefetch request, whether the prefetch request is likely to be satisfied from (“hit”) the cache. The arbitration priority of prefetch requests that are predicted to hit the cache is reduced relative to demand requests or other prefetch requests that are predicted to miss in the cache. Accordingly, an arbiter for the cache is less likely to select prefetch requests that hit the cache, thereby improving processor throughput.10-02-2014
20140297961SELECTIVE CACHE FILLS IN RESPONSE TO WRITE MISSES - A cache memory receives a request to perform a write operation. The request specifies an address. A first determination is made that the cache memory does not include a cache line corresponding to the address. A second determination is made that the address is between a previous value of a stack pointer and a current value of the stack pointer. A third determination is made that a write history indicator is set to a specified value. The write operation is performed in the cache memory without waiting for a cache fill corresponding to the address to be performed, in response to the first, second, and third determinations.10-02-2014
20140292756Hybrid Render with Deferred Primitive Batch Binning - A system, method and a computer program product are provided for hybrid rendering with deferred primitive batch binning. A primitive batch is generated from a sequence of primitives. Initial bin intercepts are identified for primitives in the primitive batch. A bin for processing is identified. The bin corresponds to a region of a screen space. Pixels of the primitives intercepting the identified bin are processed. Next bin intercepts are identified while the primitives intercepting the identified bin are processed.10-02-2014
20140281592Global Efficient Application Power Management - A method, system and computer-readable medium for allocating power among computing resources are provided. The method calculates an activity level of a first computer resource. When the activity level is less than a threshold value, the method increases the power allocation to a second computing resource. When the activity level exceeds the threshold value, the method decreases the power allocation to the second computing resource.09-18-2014
20140281234SERVING MEMORY REQUESTS IN CACHE COHERENT HETEROGENEOUS SYSTEMS - Apparatus, computer readable medium, and method of servicing memory requests are presented. A read request for a memory block from a requester processing having a processor type may be serviced by providing exclusive access to the requested memory block to the requester processor when the requested memory block was modified a last time it was accessed by a previous requester processor having a same processor type as the processor type of the requester processor. Exclusive access to the requested memory block may be provided to the requester processor based on whether the requested memory block was modified by a previous processor having a same type as the requester processor at least once in the last several times the memory block was in a cache of the previous processor. Exclusive access to the requested memory block may be provided to the requester processor based on a region of the memory block.09-18-2014
20140258688BENCHMARK GENERATION USING INSTRUCTION EXECUTION INFORMATION - Methods and systems are provided for generating a benchmark representative of a reference process. One method involves obtaining execution information for a subset of the plurality of instructions of the reference process from a pipeline of a processing module during execution of those instructions by the processing module, determining performance characteristics quantifying the execution behavior of the reference process based on the execution information, and generating the benchmark process that mimics the quantified execution behavior of the reference process based on the performance characteristics.09-11-2014
20140253189Control Circuits for Asynchronous Circuits - The described embodiments include a computing device with one or more asynchronous circuits and control circuits that control the operation of the asynchronous circuits. In some embodiments, the control circuits are arranged in a hierarchy with a top-level control circuit atop the hierarchy and one or more local control circuits lower in the hierarchy. In these embodiments, the top-level control circuit processes operating information for the one or more asynchronous circuits and/or other functional blocks in the computing device to determine an operating state for the computing device. Based on the operating state, the top-level control circuit communicates commands to the local control circuits to cause the local control circuits to operate in corresponding operating modes. Based on a corresponding operating mode command, each local control circuit sets one or more operating parameters for corresponding asynchronous circuits (and/or one or more other functional blocks).09-11-2014
20140250442Conditional Notification Mechanism - The described embodiments include a computing device. In these embodiments, an entity in the computing device receives an identification of a memory location and a condition to be met by a value in the memory location. Upon a predetermined event occurring, the entity causes an operation to be performed when the value in the memory location meets the condition.09-04-2014
20140250312Conditional Notification Mechanism - The described embodiments comprise a first hardware context. The first hardware context receives, from a second hardware context, an indication of a memory location and a condition to be met by the memory location. The first hardware context then sends a signal to the second hardware context when the memory location meets the condition.09-04-2014
20140244984ELIGIBLE STORE MAPS FOR STORE-TO-LOAD FORWARDING - The present invention provides a method and apparatus for generating eligible store maps for store-to-load forwarding. Some embodiments of the method include generating information associated with a load instruction in a load queue. The information indicates whether one or more store instructions in a store queue is older than the load instruction and whether the store instruction(s) overlap with any younger store instructions in the store queue that are older than the load instruction. Some embodiments of the method also include determining whether to forward data associated with a store instruction to the load instruction based on the information. Some embodiments of the apparatus include a load-store unit that implements embodiments of the method.08-28-2014
20140244978CHECKPOINTING REGISTERS FOR TRANSACTIONAL MEMORY - The present invention provides a method and apparatus for checkpointing registers for transactional memory. Some embodiments of the apparatus include first rename logic configured to map up to a predetermined number of architectural registers to corresponding first physical registers that hold first values associated with the architectural registers. The mapping is responsive to a transaction modifying one or more of the first values associated with the architectural registers. Some embodiments of the apparatus also include microcode configured to write contents of the first physical registers to a memory in response to the transaction modifying first values associated with a number of the architectural registers that is larger than the predetermined number.08-28-2014
20140244932METHOD AND APPARATUS FOR CACHING AND INDEXING VICTIM PRE-DECODE INFORMATION - The present invention provides a method and apparatus for caching pre-decode information. Some embodiments of the apparatus include a first pre-decode array configured to store pre-decode information for an instruction cache line that is resident in a first cache in response to the instruction cache line being evicted from one or more second cache(s). Some embodiments of the apparatus also include a second array configured to store a plurality of bits associated with the first cache. Subsets of the bits are configured to store pointers to the pre-decode information associated with the instruction cache line.08-28-2014
20140240009State Machine for Low-Noise Clocking of High Frequency Clock - Methods, apparatus, and fabrication techniques relating to management of noise arising from capacitance in a clock tree of an integrated circuit. In some embodiments, the methods comprise receiving a signal to adjust a clock having a first rate to a second rate; and ramping, in response to receiving the signal, the clock from the first rate to the second rate, wherein the ramping comprises changing the frequency of the clock to at least one third rate between the first and second rates.08-28-2014
20140237815STIFFENER FRAME FIXTURE - Methods and apparatus for coupling a stiffener frame to a circuit board are disclosed. In one aspect, an apparatus for engaging a stiffener frame and a circuit board positioned in a fixture is provided. The stiffener frame includes an edge. The apparatus includes an alignment plate that has a shoulder to engage the edge of the stiffener frame. The alignment plate includes a first opening with a peripheral wall to restrain movement of a circuit board relative to the stiffener frame.08-28-2014
20140237312Scan Warmup Scheme for Mitigating DI/DT During Scan Test - We report methods relating to scan warmup of integrated circuit devices. One such method may comprise loading a scan test stimulus to and unloading a scan test response from a first set of logic elements of an integrated circuit device at a scan clock first frequency equal to a test clock frequency; adjusting the scan clock from the first frequency to a second frequency by a scan warmup unit, wherein the scan clock second frequency is equal to a system clock frequency; and capturing the scan test response by a shift logic at the scan clock second frequency. We also report processors containing components configured to implement the method, and fabrication of such processors. The methods and their implementation may reduce di/dt events otherwise commonly occurring when testing logic elements of integrated circuit devices.08-21-2014
20140237272POWER CONTROL FOR DATA PROCESSOR - A data processor includes a data processor core, and a power controller. The data processor core is adapted to control an external memory system and to perform a task by accessing the external memory system, where the task has an associated computation rate, and the data processor is adapted to control the external memory system by powering up the external memory system when needed. The power controller is coupled to the data processor core for controlling a power consumption of the data processor core and the external memory system by issuing control signals to change an activation time and an activation frequency of the data processor core and the memory system.08-21-2014
20140237212TRACKING AND ELIMINATING BAD PREFETCHES GENERATED BY A STRIDE PREFETCHER - A method, an apparatus, and a non-transitory computer readable medium for tracking prefetches generated by a stride prefetcher are presented. Responsive to a prefetcher table entry for an address stream locking on a stride, prefetch suppression logic is updated and prefetches from the prefetcher table entry are suppressed when suppression is enabled for that prefetcher table entry. A stride is a difference between consecutive addresses in the address stream. A prefetch request is issued from the prefetcher table entry when suppression is not enabled for that prefetcher table entry.08-21-2014
20140232431Clock Gater with Independently Programmable Delay - An integrated circuit device comprising first circuitry including first logic devices and a clock tree for distributing a clock signal to the first logic devices and second circuitry comprising second logic devices, a first clock gater and a second clock gater. The first and second clock gaters comprise a programmable delay circuit.08-21-2014
20140223445Selecting a Resource from a Set of Resources for Performing an Operation - The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism is configured to perform a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the identified resource is not available for performing the operation and until a resource is selected for performing the operation, the selection mechanism is configured to identify a next resource in the table and select the next resource for performing the operation when the next resource is available for performing the operation.08-07-2014
20140223199Adaptive Temperature and Power Calculation for Integrated Circuits - Methods, apparatus, and fabrication processes relating to thermal calculations of an integrated circuit device are reported. The methods may comprise determining a power consumption by a power entity of an integrated circuit, the power entity comprising at least one functional element of the integrated circuit; determining a temperature of a thermal entity, the thermal entity comprising a subset of the power entity; and adjusting at least one of a voltage or an operating frequency of at least one functional element of the power entity, based upon the temperature of the thermal entity being greater than or equal to a predetermined threshold temperature for the thermal entity.08-07-2014
20140223066Multi-Node Management Mechanism - The described embodiments include a multi-node management mechanism for managing a plurality of server nodes. These embodiments further comprise a separate set of busses coupled between each of the server nodes and the multi-node management mechanism and a controller in the multi-node management mechanism, the controller being coupled to each bus in the sets of busses. In these embodiments, the controller is configured to handle communications on each bus so that the multi-node management mechanism appears to a corresponding server node to be a separate endpoint for the bus.08-07-2014
20140217997ASYMMETRIC TOPOLOGY TO BOOST LOW LOAD EFFICIENCY IN MULTI-PHASE SWITCH-MODE POWER CONVERSION - Techniques for performing DC to DC power conversion in switch-mode converter circuits include combinations of dynamic switch shedding, phase shedding, symmetric phase circuit topologies, and asymmetric phase circuit topologies. In at least one embodiment of the invention, a method of operating a power converter circuit includes operating a first phase switch circuit portion using a first number of switch devices when the power converter circuit is configured in a first mode of operation. The first number is greater than zero. The method includes operating the first phase switch circuit portion using the first number of switch devices when the power converter circuit is configured in a second mode of operation. The method includes operating a second phase switch circuit portion using a second number of switch devices when the power converter circuit is configured in the second mode of operation. The second number is greater than the first number.08-07-2014
20140215187SOLUTION TO DIVERGENT BRANCHES IN A SIMD CORE USING HARDWARE POINTERS - A system and method for efficiently processing instructions in hardware parallel execution lanes within a processor. In response to a given divergent point within an identified loop, a compiler generates code wherein when executed determines a size of a next very large instruction world (VLIW) to process and determine multiple pointer values to store in multiple corresponding PC registers in a target processor. The updated PC registers point to instructions intermingled from different basic blocks between the given divergence point and a corresponding convergence point. The target processor includes a single instruction multiple data (SIMD) micro-architecture. The assignment for a given lane is based on branch direction found at runtime for the given lane at the given divergent point. The processor includes a vector register for mapping PC registers to execution lanes.07-31-2014
20140215183HARDWARE AND SOFTWARE SOLUTIONS TO DIVERGENT BRANCHES IN A PARALLEL PIPELINE - A system and method for efficiently processing instructions in hardware parallel execution lanes within a processor. In response to a given divergent point within an identified loop, a compiler arranges instructions within the identified loop into very large instruction words (VLIW's). At least one VLIW includes instructions intermingled from different basic blocks between the given divergence point and a corresponding convergence point. The compiler generates code wherein when executed assigns at runtime instructions within a given VLIW to multiple parallel execution lanes within a target processor. The target processor includes a single instruction multiple data (SIMD) micro-architecture. The assignment for a given lane is based on branch direction found at runtime for the given lane at the given divergent point. The target processor includes a vector register for storing indications indicating which given instruction within a fetched VLIW for an associated lane to execute.07-31-2014
20140211571Adjustment of Write Timing in a Memory Device - A method and system are provided for adjusting a write timing in a memory device. For instance, the method can include receiving a data signal, a write clock signal, and a reference signal. The method can also include detecting a phase shift in the reference signal over time. The phase shift of the reference signal can be used to adjust a phase difference between the data signal and the write clock signal, where the memory device recovers data from the data signal based on an adjusted write timing of the data signal and the write clock signal.07-31-2014
20140204658Memory Cell Flipping for Mitigating SRAM BTI - An apparatus may comprise a memory cell configured to operate according to a voltage mode, a voltage controller coupled with the memory cell, wherein the voltage controller is configured to change the voltage mode of the memory cell between a low voltage mode and a high voltage mode, and a memory controller module coupled with the memory cell, wherein the memory controller is configured to invert a logic state stored in the memory cell based on the voltage mode.07-24-2014
20140201542ADAPTIVE PERFORMANCE OPTIMIZATION OF SYSTEM-ON-CHIP COMPONENTS - Methods, apparatus, and fabrication relating to adaptive performance optimization of a plurality of components in view of power consumption and demand, component activity, and thermal events. A method may comprise allocating a first power budget to a first component of an apparatus, wherein the first power budget is less than a maximum power required by the first component; applying at least a portion of a borrowable power budget, wherein the borrowable power budget equals the maximum power required by the first component minus the first power budget, to a second component of the apparatus; and increasing the first power budget of the first component, in response to a first number or more of thermal events occurring in a first time period.07-17-2014
20140201507THREAD SELECTION AT A PROCESSOR BASED ON BRANCH PREDICTION CONFIDENCE - A processor employs one or more branch predictors to issue branch predictions for each thread executing at an instruction pipeline. Based on the branch predictions, the processor determines a branch prediction confidence for each of the executing threads, whereby a lower confidence level indicates a smaller likelihood that the corresponding thread will actually take the predicted branch. Because speculative execution of an untaken branch wastes resources of the instruction pipeline, the processor prioritizes threads associated with a higher confidence level for selection at the stages of the instruction pipeline.07-17-2014

Patent applications by Advanced Micro Devices, Inc.

Website © 2016 Advameg, Inc.