Patent application number | Description | Published |
20080201620 | METHOD AND SYSTEM FOR UNCORRECTABLE ERROR DETECTION - A system, method and program product for utilizing error correction code (ECC) logic to detect multi-bit errors. In one embodiment, a first test pattern and a second test pattern are applied to a set of hardware bit positions. The first and second patterns are multiple logic level patterns and the second test pattern is the logical complement of the first test pattern. The first and second test patterns are utilized by the ECC logic to detect correctable errors having n or fewer bits. One or more bit positions of a first correctable error occurring responsive to applying the first test pattern are determined and one or more bit positions of a second correctable error occurring responsive to applying the second test pattern are determined. The determined bit positions of the first and second correctable errors are processed to identify a multiple-bit error within the set of hardware bit positions. | 08-21-2008 |
20090300290 | Memory Metadata Used to Handle Memory Errors Without Process Termination - Embodiments of the invention provide an interrupt handler configured to distinguish between critical and non-critical unrecoverable memory errors, yielding different actions for each. Doing so may allow a system to recover from certain memory errors without having to terminate a running process. In addition, when an operating system critical task experiences an unrecoverable error, such a task may be acting on behalf of a non-critical process (e.g., when swapping out a virtual memory page). When this occurs, an interrupt handler may respond to a memory error with the same response that would result had the process itself performed the memory operation. Further, firmware may be configured to perform diagnostics to identify potential memory errors and alert the operating system before a memory region state change occurs, such that the memory error would become critical. | 12-03-2009 |
20090300425 | Resilience to Memory Errors with Firmware Assistance - Embodiments of the invention provide an interrupt handler configured to distinguish between critical and non-critical unrecoverable memory errors, yielding different actions for each. Doing so may allow a system to recover from certain memory errors without having to terminate a running process. In addition, when an operating system critical task experiences an unrecoverable error, such a task may be acting on behalf of a non-critical process (e.g., when swapping out a virtual memory page). When this occurs, an interrupt handler may respond to a memory error with the same response that would result had the process itself performed the memory operation. Further, firmware may be configured to perform diagnostics to identify potential memory errors and alert the operating system before a memory region state change occurs, such that the memory error would become critical. | 12-03-2009 |
20090300434 | Clearing Interrupts Raised While Performing Operating System Critical Tasks - Embodiments of the invention provide an interrupt handler configured to distinguish between critical and non-critical unrecoverable memory errors, yielding different actions for each. Doing so may allow a system to recover from certain memory errors without having to terminate a running process. In addition, when an operating system critical task experiences an unrecoverable error, such a task may be acting on behalf of a non-critical process (e.g., when swapping out a virtual memory page). When this occurs, an interrupt handler may respond to a memory error with the same response that would result had the process itself performed the memory operation. Further, firmware may be configured to perform diagnostics to identify potential memory errors and alert the operating system before a memory region state change occurs, such that the memory error would become critical. | 12-03-2009 |
20100293437 | System to Improve Memory Failure Management and Associated Methods - A system to improve memory failure management may include memory, and an error control decoder to determine failures in the memory. The system may also include an agent that may monitor failures in the memory. The system may further include a table where the error control decoder may record the failures, and where the agent can read and write to. | 11-18-2010 |
20110029807 | IMPLEMENTING ENHANCED MEMORY RELIABILITY USING MEMORY SCRUB OPERATIONS - A method and circuit for implementing enhanced memory reliability using memory scrub operations to determine a frequency of intermittent correctable errors, and a design structure on which the subject circuit resides are provided. A memory scrub for intermittent performs at least two reads before moving to a next memory scrub address. A number of intermittent errors is tracked where an intermittent error is identified, responsive to identifying one failing read and one passing read of the at least two reads. | 02-03-2011 |
20110320911 | Computer System and Method of Protection for the System's Marking Store - A method and apparatus for controlling marking store updates in a central electronic complex with a plurality of core processors and eDRAM cache and interconnect bus to a service processor for loading memory controller firmware to dual-channel DDR3 memory controllers with an internal marking store. Loaded firmware of the memory controllers is responsible for tracking of ECC errors using a ECC decoder control whereby said marking store is written by a slow ECC decoder, and read by a fast ECC decoder for every read operation of said memory controllers to provide a blocking mechanism for notifying marking store firmware when the marking store has been updated and which guarantees that marking store firmware cannot write to the marking store until the marking store firmware has seen updates without causing the marking store hardware to time out. | 12-29-2011 |
20120260139 | Firmware Monitoring of Memory Scrub Coverage - Mechanisms are provided in which firmware verifies he entire system's memory scrub coverage through some additional memory controller (MC) registers/attentions and builds up a processor runtime diagnostic (PRD) scrub coverage table during every scrub cycle. Firmware may go through the scrub coverage table rank-by-rank on a periodic basis to determine whether any ranks had not been covered by hardware scrubbing. Firmware may initiate a targeted scrub and diagnostic for all of the ranks that did not have adequate scrub coverage. If for some reason the system still has some memory ranks that have not been covered by the initial hardware scrub and the targeted scrub, then the firmware may perform some course of action for fault isolation. | 10-11-2012 |
20130007541 | PREEMPTIVE MEMORY REPAIR BASED ON MULTI-SYMBOL, MULTI-SCRUB CYCLE ANALYSIS - An apparatus includes a processor, a memory, and an error module operable on the processor. The error module is configured to perform a memory scrub of the memory across a scrub cycle of multiple scrub cycles. The error module is configured to identify correctable errors of symbols in the memory that are a result of accesses from a section of the memory in response to the memory scrub. The error module is configured to perform an analysis across the multiple scrub cycles, wherein the analysis comprises a determination whether at least two symbols across the multiple scrub cycles have at least one correctable error. The error module is configured to responsive to a determination that at least two symbols across the multiple scrub cycles have at least one correctable error, execute at least one repair of the memory that includes the section of memory. | 01-03-2013 |
20130007542 | PREEMPTIVE MEMORY REPAIR BASED ON MULTI-SYMBOL, MULTI-SCRUB CYCLE ANALYSIS - In some example embodiments, a method includes performing a memory scrub of a memory across a scrub cycle of multiple scrub cycles. The method includes identifying correctable errors of symbols in the memory that are a result of accesses from a section of the memory in response to the memory scrub. The method also includes performing an analysis across the multiple scrub cycles, wherein the performing of the analysis comprises determining whether at least two symbols across the multiple scrub cycles have at least one correctable error. The method includes responsive to determining that at least two symbols across the multiple scrub cycles have at least one correctable error, executing at least one repair of the memory that includes the section of memory. | 01-03-2013 |
20140143612 | SELECTIVE POSTED DATA ERROR DETECTION BASED ON HISTORY - In a data processing system, a selection is made, based at least on addresses of previously detected errors in a memory subsystem, between at least a first timing and a second timing of data transmission with respect to completion of error detection processing on a target memory block of the memory access request. In response to receipt of the memory access request and selection of the first timing, data from the target memory block is transmitted to a requestor prior to completion of error detection processing on the target memory block. In response to receipt of the memory access request and selection of the second timing, data from the target memory block is transmitted to the requestor after and in response to completion of error detection processing on the target memory block. | 05-22-2014 |
20140143614 | SELECTIVE POSTED DATA ERROR DETECTION BASED ON HISTORY - In a data processing system, a selection is made, based at least on addresses of previously detected errors in a memory subsystem, between at least a first timing and a second timing of data transmission with respect to completion of error detection processing on a target memory block of the memory access request. In response to receipt of the memory access request and selection of the first timing, data from the target memory block is transmitted to a requestor prior to completion of error detection processing on the target memory block. In response to receipt of the memory access request and selection of the second timing, data from the target memory block is transmitted to the requestor after and in response to completion of error detection processing on the target memory block. | 05-22-2014 |
20140195852 | MEMORY TESTING OF THREE DIMENSIONAL (3D) STACKED MEMORY - A method includes reading, at a memory controller, data from a first dynamic random-access memory (DRAM) die layer of a DRAM stack. The method also includes writing the data to a second DRAM die layer of the DRAM stack. The method further includes sending a request to a test engine to test the first DRAM die layer after writing the data to the second DRAM die layer. | 07-10-2014 |
20140195867 | MEMORY TESTING WITH SELECTIVE USE OF AN ERROR CORRECTION CODE DECODER - A method includes directing an access of a memory location of a memory device to an error correction code (ECC) decoder in response to receiving a test activation request indicating the memory location. The method also includes writing a test pattern to the memory location and reading a value from the memory location. The method further includes determining whether a fault is detected at the memory location based on a comparison of the test pattern and the value. | 07-10-2014 |
20140281681 | ERROR CORRECTION FOR MEMORY SYSTEMS - According to one embodiment, a method for error correction in a memory module having ranks is provided where each rank has memory devices. The method includes determining a first mark condition for a first rank of the memory module, the first mark condition based on one or more uncorrectable error occurring in a first memory device in the first rank, placing a first mark in the first memory device, determining a second mark condition for the first rank, the second mark condition based on one or more uncorrectable error occurring in a second memory device in the first rank, placing a second mark in a third memory device in a second rank of the memory module and configuring the first memory device to respond to commands directed to the second rank, wherein configuring the first memory device is based on placing of the first mark and the second mark. | 09-18-2014 |