Class / Patent application number | Description | Number of patent applications / Date published |
714013000 | Prepared backup processor (e.g., initializing cold backup) or updating backup processor (e.g., by checkpoint message) | 25 |
20080201605 | Dead man timer detecting method, multiprocessor switching method and processor hot plug support method - A Dead man timer detecting method, a multiprocessor switching method, and a processor hot plug support method are provided. A hot spare boot control register communicated with the Dead man timer is used to detect functions of the Dead man timer, such as enabling, timing, disabling, and responding. After an operation system is booted, the Dead man timer is used to achieve automatic switch among multiple processors and the support for the processor hot plug. The method can detect various functions of the Dead man timer, and be switched among multiple processors automatically and periodically, without being limited by the type of operation systems and processors, and realize the support to the processor hot plug, thereby improving the safety for the hot plug operation. | 08-21-2008 |
20080229146 | High Availability Multi-Processor System - A method and system are provided for enabling replacement of a failed processor without requiring redundancy of hardware. The system is a multiprocessing computer system that includes one or more processor chips. Each processor chip may include one or more logical processors. During system initialization, one or more logical processors may be reserved in an inactive state. In the event an error is detected on a logical or physical processor, one or more reserved logical processors may have execution context transferred from the processor experiencing the error. Thereafter, the active processor is designated as inactive and replaced by the inactive processor to which the execution context has been transferred. | 09-18-2008 |
20080307255 | FAILURE RECOVERY AND ERROR CORRECTION TECHNIQUES FOR DATA LOADING IN INFORMATION WAREHOUSES - A method of data loading for large information warehouses includes performing checkpointing concurrently with data loading into an information warehouse, the checkpointing ensuring consistency among multiple tables; and recovering from a failure in the data loading using the checkpointing. A method is also disclosed for performing versioning concurrently with data loading into an information warehouse. The versioning method enables processing undo and redo operations of the data loading between a later version and a previous version. Data load failure recovery is performed without starting a data load from the beginning but rather from a latest checkpoint for data loading at an information warehouse level using a checkpoint process characterized by a state transition diagram having a multiplicity of states; and tracking state transitions among the states using a system state table. | 12-11-2008 |
20090083575 | Replacing A Failing Physical Processor - Replacing a failing physical processor in a computer supporting multiple logical partitions, where the logical partitions include dedicated partitions and shared processor partitions, the dedicated partitions are supported by virtual processors having assigned physical processors, and the shared processor partitions are supported by pools of virtual processors. The pools of virtual processors have assigned physical processors. Embodiments operate generally by assigning priorities to the dedicated partitions and to the pools of virtual processors; detecting a checkstop of a failing physical processor; retrieving the failing physical processor's state; replacing by a hypervisor the failing physical processor with a replacement physical processor assigned to a dedicated partition or pool, which dedicated partition or pool has the lowest priority among the priorities of the dedicated partitions and pools; and assigning the retrieved state of the failing physical processor as the state of the replacement physical processor. | 03-26-2009 |
20090100292 | Method and Device for Monitoring the Functionality of an Automation System of a Plant - There is described a method for monitoring the functionality of an automation system of a plant comprising at least one main processor, parts of the plant being monitored and controlled using a user software, which is constructed of a number of program modules and which is run on the main processor. A co-processor is assigned to the main processor, and a message is transmitted from the main processor to the co-processor. When received, this message is used by the co-processor to start a monitoring time. When a subsequent message is received, this monitoring time is reset before said monitoring time has elapsed, otherwise a fault is identified once the monitoring time has elapsed. | 04-16-2009 |
20090106586 | Assigning A Processor To A Logical Partition - Assigning a processor to a logical partition in a computer supporting multiple logical partitions that include assigning priorities to partitions, detecting a checkstop of a failing processor of a partition, retrieving the failing processor's state, replacing by a hypervisor the failing processor with a replacement processor from a partition having a priority lower than the priority of the partition of the failing processor, and assigning the retrieved state of the failing processor as the state of the replacement processor. | 04-23-2009 |
20090183027 | CHECKPOINTING AND RESTORING USER SPACE DATA STRUCTURES USED BY AN APPLICATION - Provided are a method, system, and article of manufacture for checkpointing and restoring user space data structures used by an application accessing a data structure maintained by an operating system for an executing application. Information in the accessed data structure is saved with checkpoint information for the application. An operation to restore the application from the checkpoint information is initialized. A restored data structure is generated to include the saved information in the accessed data structure saved in the checkpoint information in response to restoring the application. An initialization routine of the application is modified to bypass initializing the data structure as part of the application initialization routine to restore the application. | 07-16-2009 |
20090217087 | COMPUTER DEVICE, CONTINUING OPERATION METHOD FOR COMPUTER DEVICE, AND PROGRAM - A computer device that includes a plurality of processor boards each provided with a processor, a memory, and a chipset, includes a first processor board that makes data in a cache, which have become unfixed as a result of an uncorrectable failure, invalid when the uncorrectable failure occurs on the first processor board in operation, and switches from the first processor board to a second processor board for replacement, and the second processor board that re-executes an instruction that was being executed in the first processor board when the failure occurred. | 08-27-2009 |
20090240981 | BOOTSTRAP DEVICE AND METHODS THEREOF - A method of booting a multi-processor data processing device includes establishing a link between a first processor and a memory. The link is monitored to determine if, in response to a request from the processor, expected initialization data is communicated between the memory and the first processor. If unexpected data is detected on the link, the link is severed and a new link established between a second processor and the memory to allow the second processor to initiate the boot process. This ensures that, in the event of an error in the boot process at the first processor, the device can complete the boot process, thereby reducing device downtime. | 09-24-2009 |
20100031084 | CHECKPOINTING IN A PROCESSOR THAT SUPPORTS SIMULTANEOUS SPECULATIVE THREADING - Embodiments of the present invention provide a system for executing program code on a processor. In these embodiments, the processor is configured to start by using a primary strand to execute program code. Upon detecting a predetermined condition, the processor is configured to instantaneously checkpoint an architectural state of the primary strand and then use the subordinate strand to copy the checkpointed state to memory while using the primary strand to continue executing the program code without interruption. | 02-04-2010 |
20110083040 | LOG-BASED ROLLBACK-RECOVERY - Log-Based Rollback Recovery for system failures. The system includes a storage medium, and a component configured to transition through a series of states. The component is further configured to record in the storage medium the state of the component every time the component communicates with another component in the system, the system being configured to recover the most recent state recorded in the storage medium following a failure of the component. | 04-07-2011 |
20110161729 | PROCESSOR REPLACEMENT - Techniques for transparently replacing a processor, that receives interrupts in a partitioned computing device, with a replacement processor, are disclosed. In at least some embodiments, methods are discussed for directing the interrupts to an unchangeable identifier mapped to the processor's identifier and replacing the processor with the replacement processor. An intermediary, such as an I/O APIC, is used for storing the unchangeable identifier. The mapping may use logical mode delivery, physical mode delivery, or interrupt mapping. | 06-30-2011 |
20120042205 | System and Method for Completeness of TCP Data in TCP HA - System and method for completeness of transmission control protocol (TCP) high availability (HA) are disclosed. The system includes an active processor, having an application and a TCP, and a standby processor, having another application and another TCP; wherein communications among the active application, the active TCP, the standby application and the standby TCP quickly and efficiently enable the system seamlessly switching over from the active processor to the standby processor for transmission of incoming TCP data streams and outgoing TCP data streams if the active processor fails. | 02-16-2012 |
20120060055 | SYSTEM AND METHOD FOR RESPONDING TO FAILURE OF A HARDWARE LOCUS AT A COMMUNICATION INSTALLATION - A method for responding to a failure of hardware locus of at a communication installation having a plurality of control apparatuses for controlling a plurality of processes distributed among a plurality of hardware loci, the hardware loci including at least one spare hardware locus, includes the steps of: (a) Shifting control of a failed process from an initial control apparatus to an alternate control apparatus located at an alternate hardware locus than the failed hardware locus. The failed process is a respective process controlled by the initial control apparatus located at the failed hardware locus. (b) Relocating the respective control apparatuses located at the failed hardware locus to a spare hardware locus. (c) Shifting control of the failed process from the alternate control apparatus to the initial control apparatus relocated at the spare hardware locus. | 03-08-2012 |
20120144233 | Obviation of Recovery of Data Store Consistency for Application I/O Errors - Embodiments comprise a plurality of computing devices that dynamically intercept process application I/O errors. Various embodiments comprise two or more computing devices, such as two or more servers, each having access to a shared data storage system. An application may be executing on the first computing device and performing an I/O operation when an I/O error occurs. The first computing device may intercept the I/O error, rather than passing it back to the application, and prevent the error from affecting the application. The first computing device may complete the I/O operation, and any other pending I/O operations not written to disk, via an alternate path, perform a checkpoint operation to capture the state of the set of processes associated with the application, and transfer the checkpoint image to the second computing device. The second computing device may resume operation of the application from the checkpoint image. | 06-07-2012 |
20120173922 | APPARATUS AND METHOD FOR HANDLING FAILED PROCESSOR OF MULTIPROCESSOR INFORMATION HANDLING SYSTEM - An apparatus for handling a failed processor of a multiprocessor system including at least two processors interconnected by processor interconnects for facilitating transactions of the processors. The at least two processors include a first processor set as a default boot processor in response to a boot up operation of the multiprocessor computer, and a second processor. The apparatus includes: a baseboard management module for detecting and receiving health information of the processors; a multiplexer coupled to the baseboard management module and respectively to the processors, the multiplexer being operative to switch between the processors; and a processor ID controller coupled to the baseboard management module and respectively to the processors. In response to the health information indicating the first processor has failed, the processor ID controller sets the second processor as the default boot processor and the baseboard management module enables the multiplexer to switch to the second processor. | 07-05-2012 |
20120278653 | HANDLING A FAILED PROCESSOR OF MULTIPROCESSOR INFORMATION HANDLING SYSTEM - A method for handling a failed processor of a multiprocessor system, the multiprocessor system comprising at least two processors interconnected by processor interconnects for transactions between processors, the processors comprising a first processor and a second processor, the first processor being set as a default boot processor in response to a boot-up operation of the multiprocessor system. The method comprises: detecting and receiving, via a baseboard management module, health information of the at least two processors; providing a multiplexer operative to switch between the at least two processors, the multiplexer being coupled to the baseboard management module and respectively to the at least two processors; and, in response to the health information indicating the first processor has failed, setting, via a processor ID controller, the second processor as the default boot processor and enabling, via the baseboard management module, the multiplexer to switch to the second processor. | 11-01-2012 |
20120290874 | JOB MIGRATION IN RESPONSE TO LOSS OR DEGRADATION OF A SEMI-REDUNDANT COMPONENT - A method of managing the workload in a computer system having one or more semi-redundant hardware components is provided. The method comprises detecting loss or degradation of the level of performance of one or more of the semi-redundant hardware components, identifying hardware components affected by the loss or degradation, migrating a critical job from an affected hardware component to an unaffected hardware component, and performing less-critical jobs on an affected hardware component. Loss or degradation of the semi-redundant component reduces the capacity of affected hardware components in the computer system without entirely disabling the computer system. Jobs identified as critical run on hardware components having the most capacity and reliability, while less-critical jobs use the remaining capacity of affected hardware components. Examples of semi-redundant hardware components include a memory module, CPU core, Ethernet port, power supply, fan, disk drive, and an input output port. | 11-15-2012 |
20130124918 | SELF-REPARABLE SEMICONDUCTOR AND METHOD THEREOF - A semiconductor device includes a plurality of processors and a spare processor configured to perform respective processing functions. A plurality of first switches is located at respective inputs of the plurality of processors. Each of the plurality of first switches is configured to selectively provide an input signal to a respective one of the plurality of processors and the spare processor. A first multiplexer is located at an input of the spare processor. The first multiplexer is configured to receive the input signals from each of the plurality of first switches and route, to the spare processor, a selected one of the input signals corresponding to a failed one of the plurality of processors. The spare processor is further configured to perform a processing function associated with the failed one of the plurality of processors in response to receiving the selected one of the input signals. | 05-16-2013 |
20130145211 | FLEXIBLE REPLICATION WITH SKEWED MAPPING IN MULTI-CORE CHIPS - For a flexible replication with skewed mapping in a multi-core chip, a request for a cache line is received, at a receiver core in the multi-core chip from a requester core in the multi-core chip. The receiver and requester cores comprise electronic circuits. The multi-core chip comprises a set of cores including the receiver and the requester cores. A target core is identified from the request to which the request is targeted. A determination is made whether the target core includes the requester core in a neighborhood of the target core, the neighborhood including a first subset of cores mapped to the target core according to a skewed mapping. The cache line is replicated, responsive to the determining being negative, from the target core to a replication core. The cache line is provided from the replication core to the requester core. | 06-06-2013 |
20140215264 | INFORMATION PROCESSING APPARATUS AND CONTROL METHOD FOR INFORMATION PROCESSING APPARATUS - An information processing apparatus includes a switch unit configured to connect some of the arithmetic processing devices and some of the storage devices in accordance with connection information, a first control unit being configured to output physical information converted from the logical information of the arithmetic processing device at the transmission destination and the physical information of the corresponding arithmetic processing device via a transfer path in accordance with the correlation information, a second control unit configured to change the connection information in response to occurrence of a failure of some arithmetic processing device in the system, and to control the switch unit such that the failed arithmetic processing device is replaced with another one included in the plural arithmetic processing devices. | 07-31-2014 |
20140223225 | MULTI-CORE RE-INITIALIZATION FAILURE CONTROL SYSTEM - A method of a computer system recovering from a core re-initialization failure is described. The method may include automatically detect a core re-initialization failure during a core re-initialization process by a hypervisor. The hypervisor automatically determines whether the core re-initialization failure is a permanent failure. If the core re-initialization failure is a permanent failure, then automatically determine, by the hypervisor, which cores are re-initialized and which cores are indeterminate. Automatically allocate the re-initialized cores between one or more virtual machines by the hypervisor. | 08-07-2014 |
20150113320 | PROCESSING APPARATUS, PROCESS SYSTEM, AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM - A processing apparatus includes a precursor detection unit that detects a precursor event indicating a precursor that a target process cannot be executed by a process unit, and a control unit that sends a preparation request to a substitution processing apparatus when the precursor detection unit detects the precursor event in which the preparation request requests the substitution processing apparatus being a ready state for starting a substitution processing. The control unit sends a termination request to the substitution processing apparatus when a predetermined condition is satisfied after the control unit sends the preparation request, in which the termination request requests the substitution processing apparatus terminating the ready state. | 04-23-2015 |
20150301911 | INFORMATION PROCESSING APPARATUS, CONTROL METHOD FOR INFORMATION PROCESSING APPARATUS, AND COMPUTER-READABLE RECORDING MEDIUM - A configuring unit performs configuration of a partition that is a combination of a system board equipped with a central processing unit and a memory, and an input/output unit equipped with an input/output device that are installed in one package, and performs allocation of a reserved system board to the partition. A switching unit switches, when a failed system board in which a failure has occurred is present, the failed system board to a reserved system board that is allocated to a partition including the failed system board. A re-setting unit performs resetting, when the failed system board is recovered after switching of system boards have been performed by the switching unit, based on information of the configuration of the partition and the allocation of the reserved system board made by the configuring unit. | 10-22-2015 |
20160077878 | Logical Data Shuffling - Embodiments relate to data shuffling by logically rotating processing nodes. The nodes are logically arranged in a two or three dimensional matrix. Every time two of the nodes in adjacent rows of the matrix are positionally aligned, these adjacent nodes exchange data. The positional alignment is a logical alignment of the nodes. The nodes are logically arranged and rotated, and data is exchanged in response to the logical rotation. | 03-17-2016 |