Entries |
Document | Title | Date |
20110060940 | N+1 FAILOVER AND RESYNCHRONIZATION OF DATA STORAGE APPLIANCES - Reciprocal data storage protection is combined with “N+1” hardware provisioning and on-demand resynchronization to guarantee full data availability without impairing performance. Pairs of nodes are identified that act as backups for each other, where each node stores a secondary copy of data allocated to its reciprocal, paired node. A single extra node is brought online to take over the role of a failed node and assumes the role of the primary node it replaced. | 03-10-2011 |
20110060941 | Method of Achieving High Reliability of Network Boot Computer System - In a network computer system, recovery may be impossible from a fault when the fault occurs in a network switch in a network or a device such as an external disk device. Provided is a computer system that includes a plurality of servers, a plurality of network, a plurality of external disk devices, and a management computer, in which the management computer detects a fault which is occurred, retrieves an application stop server inaccessible to the used disk due to the fault, retrieves the disk for storing the same contents as contents stored in the disk used by the retrieved application stop server and the external disk device including the disk, retrieves an application resuming server capable of accessing the retrieved external disk device, and transmits an instruction to boot by using the retrieved disk to the retrieved application resuming server. | 03-10-2011 |
20110066875 | APPARATUS AND METHOD FOR ESTABLISHING COMMUNICATIONS - A system that incorporates teachings of the present disclosure may include, for example, a gateway comprising a controller to transport media data between a service provider network and one or more end user devices associated with the gateway, and transmit connection signals to a second gateway for establishing communications between the service provider network and the gateway and for establishing a plurality of queues at the second gateway, where the queues are dedicated to each of femtocell, internet and VoIP services associated with the gateway. Other embodiments are disclosed. | 03-17-2011 |
20110066876 | TRAP-BASED CONFIGURATION AUDIT - A method includes generating a layer three trap packet that includes an indicator that indicates an audit request of a resident configuration file, transmitting the layer three trap packet to another device, receiving a reference configuration file in response to transmitting the layer three trap packet, comparing the resident configuration file with the reference configuration file, and replacing the resident configuration with the reference configuration file when a difference between the reference configuration file and the resident configuration file exists. | 03-17-2011 |
20110078490 | SVC CLUSTER CONFIGURATION NODE FAILOVER SYSTEM AND METHOD - Methods, systems, and computer programs are provided for failover responses to configuration node failures in SVC clusters. An SVC cluster manages a plurality of storage devices and includes a plurality of SVCs interconnected via a network, each SVC acting as a separate node. A new configuration node is activated in response to configuration node failures. The new configuration node retrieves client subscription information about events occurring in storage devices managed by the SVC cluster from the storage devices. In response to events occurring in the storage device managed by the SVC cluster, the new configuration node obtains storage device event information from a storage device event monitoring unit. The new configuration node sends storage device events to clients who have subscribed to this information according to subscription information obtained. The storage device is not installed in the original configuration node. This method allows complete transparency of the configuration node failover process to clients. | 03-31-2011 |
20110083037 | RELIABLE MEDIA STREAMING - A reliable streaming system increases reliability of live and on-demand streaming media events through a robust server architecture that allows fast failover and recovery in the event of network, hardware, or other failures. The system provides for failover of encoders, ingest servers, which receive encoded media data from encoders, and origin servers, which serve as the retrieval point of last resort for connecting clients. The system also provides a push proxy mechanism that allows one copy of data to feed redundant servers and pre-warm caches, saving on provisioned bandwidth. In addition, the system provides a distribution server role that allows content to be automatically syndicated to a region when needed. Thus, the reliable streaming system provides a streaming solution with no single point of failure and redundancy and fast failover built into the content network architecture. | 04-07-2011 |
20110099414 | Method and Device for Operating a Network and Communication System Comprising Such Device - A method and a device are provided for operating a network, wherein the network comprises several network elements that are connected via a ring, wherein a first segment is of a first type of connection and wherein a second segment is of a second type of connection, wherein the ring comprises at least one first segment and at least one second segment and wherein one network element of the network elements is a ring master comprising a primary port and a secondary port, comprising the steps (i) a failure of at least one of the at least one first segment is detected by the ring master; (ii) the ring master unblocks its secondary port; and (iii) the ring master sends a first message via its primary port and via its secondary port. | 04-28-2011 |
20110107138 | SERVER SWITCHING METHOD AND SERVER SYSTEM EQUIPPED THEREWITH - There is disclosed a high speed switching method for a disk image delivery system fail-over. A management server sends a disk image of an active server in advance to a standby server. When receiving a report that the active server has failed, the management server judges whether or not it is possible for the standby server to perform the service of the failed active server based on service provision management server information held by the management server and if possible, instructs the standby server to perform the service of the active server. Even if the disk image delivered in advance is different from the disk image of the failed active server, switching of the service to the standby server can be performed more quickly through resetting the setting values of unique information and installing the additional pieces of software on the standby server by the management server than redelivering an appropriate disk image. | 05-05-2011 |
20110126041 | TRANSPORT CONTROL SERVER, NETWORK SYSTEM AND TRASNPORT CONTROL METHOD - The time required for a path recalculation and a path switching upon occurrence of a failure is shortened. A path generation unit of a transport control server (TCS) S- | 05-26-2011 |
20110145630 | REDUNDANT, FAULT-TOLERANT MANAGEMENT FABRIC FOR MULTIPARTITION SERVERS - Redundant, fault-tolerant management fabric for multipartition servers are disclosed. In an exemplary embodiment, a method comprises connecting a plurality of rack system components to a first network segment, the connection including at least two physical links sharing a single network address. The method also comprises monitoring communications paths in the first network segment. The method also comprises switching communications from the first network segment to a failover network segment if there is a failure in any of the communications paths in the first network segment. | 06-16-2011 |
20110145631 | ENHANCED CLUSTER MANAGEMENT - An embodiment of the present invention is directed to a method and system for making intelligent failover decisions within a server cluster. The method includes receiving temperature information and location information using RFID technology and detecting an error condition. The method further includes responsive to the error condition, selecting a failover target based on said temperature information and location information and transferring operations from a portion of a storage cluster to the failover target based on the selecting. | 06-16-2011 |
20110161723 | DISASTER RECOVERY USING LOCAL AND CLOUD SPANNING DEDUPLICATED STORAGE SYSTEM - A spanning storage interface facilitates the use of cloud storage services by storage clients and may perform data deduplication. The spanning storage interface may include local storage for caching data from storage clients. A disaster recovery application includes at least first and second spanning storage interfaces at first and second network locations. The second spanning storage interface is provided for at least disaster recovery operations. The second spanning storage interface includes second local storage for improving data access performance. A copy of the local cache of the first spanning storage interface is transferred to the second local storage while the first network location is operating. In the event of a disaster affecting the first network location, the second spanning storage interface can provide data access to the first network location's data with improved performance from using the copy of local cache in the second local storage. | 06-30-2011 |
20110161724 | DATA MANAGEMENT APPARATUS, MONITORING APPARATUS, REPLICA APPARATUS, CLUSTER SYSTEM, CONTROL METHOD AND COMPUTER-READABLE MEDIUM - A data management apparatus, which is connected to a monitoring apparatus that monitors an operating state of a service, and which provides a service for managing data, comprises: a status management unit which manages a status of a service provided by itself; a notification unit which periodically notifies the monitoring apparatus of a status of the service; a receiving unit which receives a request from an application to which the service is provided; and a rejecting unit which rejects, when the request received by the receiving unit is an update request of data, the update request if a status associated with updating of the service managed by the status management unit is a limited status. | 06-30-2011 |
20110173490 | HIGH AVAILABILITY FOR NETWORK SECURITY DEVICES - In one example, a backup intrusion detection and prevention (IDP) device includes one or more network interfaces to receive a state update message from a primary IDP device, wherein the state update message indicates a network session being inspected by the primary IDP device and an identified application-layer protocol for the device, to receive an indication that the primary device has switched over or failed over to the backup device, and to receive a plurality of packets of the network session after receiving the indication, each of the plurality of packets comprising a respective payload including application-layer data, a protocol decoder to detect a beginning of a new transaction from the application-layer data of one of the plurality of packets, and a control unit to statefully process only the application-layer data of the network session that include and follow the beginning of the new transaction. | 07-14-2011 |
20110173491 | FAILURE RECOVERY METHOD - The reliability is improved at a low cost even in a virtualized server environment. The number of spare servers is reduced for improving the reliability and for saving a licensing fee for software on the spare servers. A server system comprises a plurality of physical servers on which a plurality of virtual servers run, a single standby server, a module for detecting an active virtual server, and a module for switching the correspondence of boot disks of virtualization modules for controlling virtual servers to the physical servers. When a physical server fails, the boot disk of the associated virtualization module is connected to a spare server to automatically activate on the spare server those virtual servers which have been active upon occurrence of the failure. | 07-14-2011 |
20110173492 | TECHNIQUE FOR PROTECTING LEAF NODES OF A POINT-TO-MULTIPOINT TREE IN A COMMUNICATIONS NETWORK IN CONNECTED MODE - A technique protects, in a communications network, a point-to-multipoint primary tree between a root node and primary leaf nodes of the network if a failure occurs affecting one of the primary leaf nodes. One network node receives from another upstream network node a request to protect the primary tree via a backup branch leading to a backup leaf node situated downstream from said node, if a failure affecting one of the primary leaf nodes to be protected occurs, the backup leaf node being associated with the primary leaf node to be protected. If the node is connected directly to the downstream primary leaf node to be protected, it configures a backup routing rule for routing packets from an upstream primary branch to the downstream backup branch and, possibly, to the downstream primary branch(es) and is activated in the event of a failure affecting the primary leaf node to be protected. | 07-14-2011 |
20110179303 | PERSISTENT APPLICATION ACTIVATION AND TIMER NOTIFICATIONS - The present invention extends to methods, systems, and computer program products for persistent application activation and timer notifications. A durable instance manager, instance execution hosts, and an instance store interoperate to transition instances between executing and persisted states. System properties are associated with an instance. System properties can define re-activation conditions, that when satisfied, indicate that an instance is to be re-activated for execution. System properties can define timers as well as indications that instances are in a persisted but ready to run state. | 07-21-2011 |
20110179304 | SYSTEMS AND METHODS FOR MULTI-TENANCY IN CONTACT HANDLING SYSTEMS - One example embodiment includes a method for providing multi-tenancy in a computing environment. The method includes receiving a script in a computing environment, where the script includes one or more actions to be completed by the computing environment. The method further includes providing one or more computing resources in the computing environment and building an action list for the one or more computing resources, where the action list is a data structure that contains a list of one or more actions to be executed by the one or more computing resources. The method further includes transmitting a first action to one of the one or more computing resources, where the first action is one of the one or more actions. The method further includes executing the first action in the one of the one or more computing resources and indicating to the action list the completion of the first action. | 07-21-2011 |
20110185221 | FAULT TOLERANT ROUTING IN A NON-HOT-STANDBY CONFIGURATION OF A NETWORK ROUTING SYSTEM - Methods and systems for facilitating fault tolerance in a non-hot-standby configuration of a network routing system are provided. According to one embodiment, a failover method is provided. One or more processing engines of a network routing system are configured to function as active processing engines, each of which having one or more software contexts. A control blade is configured to monitor the active processing engines. One or more of the processing engines are identified to function as non-hot-standby processing engines, each of which having no pre-created software contexts corresponding to the software contexts of the active processing engines. The control blade monitors the active processing engines. Responsive to detecting a fault associated with an active processing engine the active processing engine is dynamically replaced with a non-hot-standby processing engine by creating one or more replacement software contexts within the non-hot-standby processing engine corresponding to those of the active processing engine. | 07-28-2011 |
20110191624 | SYSTEMS, METHODS, AND COMPUTER READABLE MEDIA FOR PROVIDING INSTANTANEOUS FAILOVER OF PACKET PROCESSING ELEMENTS IN A NETWORK - Systems, methods, and computer program products for providing instantaneous failover of packet processing elements in a network are disclosed. According to one aspect, the subject matter described herein includes a system for providing instantaneous failover of packet processing elements in a network. The system includes a plurality of packet processing elements for processing packets in a network and a packet distribution module for maintaining information about the operating status of each element of the plurality of packet processing elements, for maintaining information about packet flows being processed by the packet processing elements, and, for each packet flow, assigning one of the plurality of packet processing elements as the primary element for the packet flow and assigning another of the plurality of packet processing elements as the secondary element for the packet flow. The packet distribution module routing packets to the packet processing elements according to the operating status of the packet processing elements that have been assigned to the packet flow with which the packets are associated. | 08-04-2011 |
20110191625 | CONTROLLING THE STATE OF DUPLEXING OF COUPLING FACILITY STRUCTURES - A coupling facility is coupled to one or more other coupling facilities via one or more peer links. The coupling of the facilities enables various functions to be supported, including the duplexing of structures of the coupling facilities. Duplexing is performed on a structure basis, and thus, a coupling facility may include duplexed structures, as well as non-duplexed or simplexed structures. | 08-04-2011 |
20110214007 | FLEXIBLE FAILOVER POLICIES IN HIGH AVAILABILITY COMPUTING SYSTEMS - A system for implementing a failover policy includes a cluster infrastructure for managing a plurality of nodes, a high availability infrastructure for providing group and cluster membership services, and a high availability script execution component operative to receive a failover script and at least one failover attribute and operative to produce a failover domain. In addition, a method for determining a target node for a failover comprises executing a failover script that produces a failover domain, the failover domain having an ordered list of nodes, receiving a failover attribute and based on the failover attribute and failover domain, selecting a node upon which to locate a resource. | 09-01-2011 |
20110214008 | NETWORK SYSTEM - A network system having duplicate lines of a primary system and a backup system between a transmitter apparatus and a receiver apparatus is provided. Each of the transmitter apparatus and the receiver apparatus includes an arithmetic operator for conducting a BIP-8 arithmetic operation and a CRC arithmetic operation on an input signal and thereby detecting a bit error. The transmitter apparatus transmits data to both lines. The receiver apparatus includes a switcher. When a bit error is detected in received data of the primary system. the switcher switches control of the primary system and the backup system. Hitless protection switching of a VC path is executed, | 09-01-2011 |
20110214009 | Creation of Highly Available Pseudo-Clone Standby Servers for Rapid Failover Provisioning - Near clones for a set of targeted computing systems are provided by determining a highest common denominator set of components among the computing systems, producing a pseudo-clone configuration definition, and realizing one or more pseudo-clone computing systems as partially configured backups for the targeted computing systems. Upon a planned failover, actual failure, or quarantine action on a targeted computing system, a difference configuration is determined to complete the provisioning of the pseudo-clone system to serve as a replacement system for the failed or quarantined system. Failure predictions can be used to implement the pseudo-clone just prior to an expected first failure of any of the targeted systems. The system can also interface to an on-demand provisioning management system to effect automated workflows to realize pseudo-clones and replacement systems automatically, as needed. | 09-01-2011 |
20110214010 | Creation of Highly Available Pseudo-Clone Standby Servers for Rapid Failover Provisioning - Near clones for a set of targeted computing systems are provided by determining a highest common denominator set of components among the computing systems, producing a pseudo-clone configuration definition, and realizing one or more pseudo-clone computing systems as partially configured backups for the targeted computing systems. Upon a planned failover, actual failure, or quarantine action on a targeted computing system, a difference configuration is determined to complete the provisioning of the pseudo-clone system to serve as a replacement system for the failed or quarantined system. Failure predictions can be used to implement the pseudo-clone just prior to an expected first failure of any of the targeted systems. The system can also interface to an on-demand provisioning management system to effect automated workflows to realize pseudo-clones and replacement systems automatically, as needed. | 09-01-2011 |
20110225447 | PREFERRED RESOURCE SELECTOR - A computer implemented method, and computer program product for requesting resources. The computer receives an assignment of an Internet protocol address. The computer compares a computer context of a client computer with an intranet access criterion to form a comparison result. The computer selects at least one preferred uniform resource identifier based on the comparison result, indicating the intranet is accessible. The computer transmits a request to a server using at least one preferred uniform resource identifier using a packet network. | 09-15-2011 |
20110225448 | FAILOVER SYSTEM AND METHOD - One aspect of the present invention provides a system for failover comprising at least one client selectively connectable to one of at least two interconnected servers via a network connection. In a normal state, one of the servers is designated a primary server when connected to the client and a remainder of the servers are designated as backup servers when not connected to the client. The at least one client is configured to send messages to the primary server. The servers are configured to process the messages using at least one service that is identical in each of the servers. The services are unaware of whether a server respective to the service is operating as the primary server or the backup server. The servers are further configured to maintain a library, or the like, that indicates whether a server is the primary server or a server is the backup server. The services within each server are to make external calls via its respective library. The library in the primary server is configured to complete the external calls and return results of the external calls to the service in the primary server and to forward results of the external calls to the service in the backup server. The library in the secondary server does not make external calls but simply forwards the results of the external calls, as received from the primary server, to the service in the secondary server when requested to do so by the service in the secondary server. | 09-15-2011 |
20110225449 | Method of Achieving High Reliability of Network Boot Computer System - In a network computer system, recovery may be impossible from a fault when the fault occurs in a network switch in a network or a device such as an external disk device. Provided is a computer system that includes a plurality of servers, a plurality of network, a plurality of external disk devices, and a management computer, in which the management computer detects a fault which is occurred, retrieves an application stop server inaccessible to the used disk due to the fault, retrieves the disk for storing the same contents as contents stored in the disk used by the retrieved application stop server and the external disk device including the disk, retrieves an application resuming server capable of accessing the retrieved external disk device, and transmits an instruction to boot by using the retrieved disk to the retrieved application resuming server. | 09-15-2011 |
20110246815 | RECOVERING FROM LOST RESOURCES IN A DISTRIBUTED SERVER ENVIRONMENT - An apparatus, method, and computer readable storage medium are disclosed to recover from lost resources in a distributed server environment. A status monitor module receives, at a first computer, periodic status messages from a peer computer. Each periodic status message indicates that the peer computer is providing a service for which the first computer serves as a backup service provider. A failure detection module determines, based on the periodic status messages, that the peer computer has stopped providing the service. An advancement module provides the service, at the first computer, in response to determining that the peer computer has stopped providing the service. | 10-06-2011 |
20110252273 | MATCH SERVER FOR A FINANCIAL EXCHANGE HAVING FAULT TOLERANT OPERATION - Fault tolerant operation is disclosed for a primary match server of a financial exchange using an active copy-cat instance, a.k.a. backup match server, that mirrors operations in the primary match server, but only after those operations have successfully completed in the primary match server. Fault tolerant logic monitors inputs and outputs of the primary match server and gates those inputs to the backup match server once a given input has been processed. The outputs of the backup match server are then compared with the outputs of the primary match server to ensure correct operation. The disclosed embodiments further relate to fault tolerant failover mechanism allowing the backup match server to take over for the primary match server in a fault situation wherein the primary and backup match servers are loosely coupled, i.e. they need not be aware that they are operating in a fault tolerant environment. As such, the primary match server need not be specifically designed or programmed to interact with the fault tolerant mechanisms. Instead, the primary match server need only be designed to adhere to specific basic operating guidelines and shut itself down when it cannot do so. By externally controlling the ability of the primary match server to successfully adhere to its operating guidelines, the fault tolerant mechanisms of the disclosed embodiments can recognize error conditions and easily failover from the primary match server to the backup match server. | 10-13-2011 |
20110258483 | Data Layout for Recovery and Durability - A Metadata server described herein is configured to generate a metadata table optimized for data durability and recovery. In generating the metadata table, the metadata server associates each possible combination of servers with one of the indices of the table, thereby ensuring that each server participates in recovery in the event of a server failure. In addition, the metadata server may also associate one or more additional servers with each index to provide added data durability. Upon generating the metadata table, the metadata server provides the metadata table to clients or servers. Alternatively, the metadata server may provide rules and parameters to clients to enable those clients to identify servers storing data items. The clients may use these parameters and an index as inputs to the rules to determine the identities of servers storing or designated to store data items corresponding to the index. | 10-20-2011 |
20110258484 | FAILOVER AND LOAD BALANCING - Also provided are techniques for failover when a network adapter fails, wherein the network adapter is connected to a miniport driver that is connected to a filter driver. With the miniport driver, it is determined that at least one of the network adapter and a data path through the network adapter has failed. With the miniport driver, the filter driver is notified that at least one of the network adapter and the data path through the network adapter has failed. | 10-20-2011 |
20110271139 | CLUSTER-FREE TECHNIQUES FOR ENABLING A DIRECTORY PROTOCOL-BASED DOMAIN NAME SYSTEM (DNS) SERVICE FOR HIGH AVAILABILITY - Cluster-free techniques for enabling a directory protocol-based Domain Name System (DNS) service for high availability are presented. A DNS service monitors a node for wild-carded IP address that migrate to the node when a primary node fails to service DNS requests for a directory of the network. The DNS service forwards the wild-carded IP address to a distributed directory service for resolution and uses the distributed directory service to dynamically configure the DNS service for directly handling subsequent DNS requests made to the directory over the network while the primary node remains inoperable over the network. | 11-03-2011 |
20110271140 | METHOD AND COMPUTER SYSTEM FOR FAILOVER - In a computer system wherein plural servers are connected with an external disk device via a network, each server incorporates therein a logic partition module for configuring at least one logic partition in the server, and the operating system stored in the logic partition is booted by the boot disk of an external disk device, the failover operation is performed only for the logic partition affected by a failure when the task being executed by a working server is taken over by another server at the time of the failure occurring in the working server. | 11-03-2011 |
20110276822 | NODE CONTROLLER FIRST FAILURE ERROR MANAGEMENT FOR A DISTRIBUTED SYSTEM - A distributed system provides error handling wherein the system includes multiple nodes, each node being coupled to multiple node controllers for control redundancy. Multiple system controllers couple to the node controllers via a network bus. A particular node controller may detect an error of that particular node controller. The particular node controller may store error information relating to the detected error in respective nonvolatile memory stores in the system controllers and node controllers according to a particular priority order. In accordance with the particular priority order, for example, the particular node controller may first attempt to store the error information to a primary system controller memory store, then to a secondary system controller memory store, and then to sibling and non-sibling node controller memory stores. The primary system controller organizes available error information for use by system administrators and other resources of the distributed system. | 11-10-2011 |
20110276823 | INFORMATION PROCESSING APPARATUS, BACKUP SERVER AND BACKUP SYSTEM - An information processing apparatus includes a backup data storage unit, a monitoring information storage unit and a backup data transfer unit. The backup data storage unit stores backup data. The monitoring information storage unit stores monitoring information that includes at least identification information and priority information of the backup data. The backup data transfer unit transfers the backup data to a backup server via a network in response to a transfer request for the backup data. The transfer request is received from the backup server on the basis of the priority information of the monitoring information which is notified to the backup server from the information processing apparatus. | 11-10-2011 |
20110289345 | METHOD AND SYSTEM FOR ENABLING CHECKPOINTING FAULT TOLERANCE ACROSS REMOTE VIRTUAL MACHINES - A checkpointing fault tolerance network architecture enables a backup computer system to be remotely located from a primary computer system. An intermediary computer system is situated between the primary computer system and the backup computer system to manage the transmission of checkpoint information to the backup VM in an efficient manner. The intermediary computer system is networked to the primary VM through a high bandwidth connection but is networked to the backup VM through a lower bandwidth connection. The intermediary computer system identifies updated data corresponding to memory pages that have been least recently modified by the primary VM and transmits such updated data to the backup VM through the low bandwidth connection. In such manner, the intermediary computer system economizes the bandwidth capacity of the low bandwidth connection, holding back updated data corresponding to more recently modified memory pages, since such memory pages may be more likely to be updated again in the future. | 11-24-2011 |
20110289346 | QPROCESSOR ARCHITECTURE IN A CLUSTER CONFIGURATION - In general, an appliance that simplifies the creation of a cluster in a computing environment has a fairly straightforward user interface that abstracts out many of the complexities of the typical configuration processes, thereby significantly simplifying the deployment process. By using such appliance, system administrators can deploy an almost turn-key cluster and have the confidence of knowing that the cluster is well tuned for the application/environment that it supports. In addition, the present disclosure allows for configurations and integrations of specialty engines, such as Q processors or J processors, into the cluster. The disclosure provides systems and methods for configuring a cluster, managing a cluster, managing an MQ in a cluster, a user interface for configuring and managing the cluster, an architecture for using specialty engines in a cluster configuration, and interconnect between cluster components, and a file system for use in a cluster. | 11-24-2011 |
20110296233 | TASK RELAY SYSTEM, APPARATUS, AND RECORDING MEDIUM - A computer which takes over a task managed by a server apparatus from another computer which occupies the task, the computer including, a processor to detect an error of the other computer, to transmit, when an error of the another computer is detected, a task relaying request for taking over the task to the server apparatus, and to allow when a permission of the takeover of the task is received from the server apparatus processes of application programs in standby states in the computer to occupy the task. | 12-01-2011 |
20110314326 | MONITORING SERVICE ENDPOINTS - Today, data networks are ever increasing in size and complexity. For example, a datacenter may comprise hundreds of thousands of service endpoints configured to perform work. To reduce network wide degradation, a load balancer may send work requests to healthy service endpoints, as opposed to unhealthy and/or inoperative service endpoints. Accordingly, among other things, one or more systems and/or techniques for monitoring service endpoints, which may be scalable for large scale networks, are provided. In particular, a consistent hash function may be performed to generate a monitoring scheme comprising assignments of service endpoints to monitoring groups. In this way, multiple monitoring components may monitor a subset of endpoints to ascertain health status. Additionally, the monitoring components may communicate between one another so that a monitoring component may know heath statuses of service endpoints both assigned and not assigned to the monitoring component. | 12-22-2011 |
20120005520 | SIMPLIFYING AUTOMATED SOFTWARE MAINTENANCE OF DATA CENTERS - An aspect of the present invention simplifies software maintenance of nodes in a data center. In one embodiment, a management system receives data specifying a set of commands to be executed on a node in the data center, and then forms a maintenance script by programmatically incorporating instructions for executing the set of commands on the node and to perform a set of management actions. The management system then executes the maintenance script to cause execution of the set of commands on the nodes, thereby performing maintenance of the node. A user/administrator of the data center needs to specify only the commands, thereby simplifying the software maintenance of data centers. According to another aspect, the maintenance scripts (formed by incorporating the commands provided by a user) are executed as part of a disaster recovery process in the data center. | 01-05-2012 |
20120005521 | METHOD AND SYSTEM FOR MAINTAINING DIRECT HARDWARE ACCESS IN THE EVENT OF NETWORK INTERFACE CARD FAILURE - A system for maintaining direct hardware access in the event of PNIC failure. A host for the system includes: a processor; a first and a second PNIC, where the first PNIC is activated and all other PNICs are deactivated; a host operating system; a virtual machine; and a hypervisor for transferring packets between the host operating system and the virtual machine. The host operating system includes a link aggregator, multiple host VNICs, and a virtual switch associated with the VNICs. The first virtual machine includes a virtual network protocol stack and a guest VNIC. The link aggregator is configured to determine whether the first PNIC has failed. Based on a determination that the first PNIC has failed, the link aggregator is further configured to: remove a virtual function mapping between the first PNIC and the virtual machine; determine the second PNIC; deactivate the first PNIC; and activate the second PNIC. | 01-05-2012 |
20120005522 | FAULT TOLERANCE FOR MAP/REDUCE COMPUTING - Embodiments of the invention include a method for fault tolerance management of workers nodes during map/reduce computing in a computing cluster. The method includes subdividing a computational problem into a set of sub-problems, mapping a selection of the sub-problems in the set to respective nodes in the cluster, directing processing of the sub-problems in the respective nodes, and collecting results from completion of processing of the sub-problems. During a first early temporal portion of processing the computational problem, failed nodes are detected and the sub-problems currently being processed by the failed nodes are re-processed. Conversely, during a second later temporal portion of processing the computational problem, sub-problems in nodes not yet completely processed are replicated into other nodes, processing of the replicated sub-problems directed, and the results from completion of processing of sub-problems collected. Finally, duplicate results are removed and remaining results reduced into a result set for the problem. | 01-05-2012 |
20120005523 | INTRA-REALM AAA FALLBACK MECHANISM - There is provided an intra-realm AAA (authentication, authorization and accounting) fallback mechanism, wherein the single global realm may be divided in one or more sub-realms. The thus presented mechanism exemplarily comprises detecting a failure of an authentication server serving at least one authentication client within a first sub-realm of a single-realm authentication system, and routing authentication messages of the at least one authentication client to a fallback authentication server within a second sub-realm of the single-realm authentication system, wherein routing may exemplarily comprise sub-realm based source routing. | 01-05-2012 |
20120011391 | MATCH SERVER FOR A FINANCIAL EXCHANGE HAVING FAULT TOLERANT OPERATION - Fault tolerant operation is disclosed for a primary match server of a financial exchange using an active copy-cat instance that mirrors operations in the primary match server, but only after those operations have successfully completed in the primary match server. Fault tolerant logic monitors inputs and outputs of the primary match server and gates those inputs to the backup match server once a given input has been processed. The outputs of the backup match server are then compared with the outputs of the primary match server to ensure correct operation. The disclosed embodiments further relate to fault tolerant failover mechanism allowing the backup match server to take over for the primary match server in a fault situation wherein the primary and backup match servers are loosely coupled. As such, the primary match server need not be specifically designed or programmed to interact with the fault tolerant mechanisms. | 01-12-2012 |
20120011392 | TAKE OVER METHOD FOR COMPUTER SYSTEM - A proposed fail over method for taking over task that is preformed on an active server to a backup server, even when the active server and the backup server have different hardware configuration. The method for making a backup server take over task when a fault occurs on a active server, comprises steps of acquiring configuration information on the hardware in the active server and the backup server, acquiring information relating the hardware in the backup server with the hardware in the active server, selecting a backup server to take over the task that is executed on the active server where the fault occurred, creating logical partitions on the selected backup server, and taking over the task executed on the active server logical partitions, in the logical partitions created on the selected backup server. | 01-12-2012 |
20120023360 | MOBILITY MANAGEMENT ENTITY FAILOVER - A method, performed by a first mobility management entity (MME) device in a network, includes receiving, from a second MME device, standby database information associated with user equipment (UE) registered with the second MME device; detecting that the second MME device has failed or lost connectivity; designating that the UEs registered with the second MME device will be registered with the first MME device, in response to detecting that the second MME device has failed or lost connectivity; detecting a request to activate a particular UE registered with the second MME device; and paging the particular UE to register with the first MME device, using the standby database information and in response to detecting the request to activate the particular UE. | 01-26-2012 |
20120023361 | SYSTEMS AND METHODS FOR RECOVERING FROM THE FAILURE OF A GATEWAY SERVER - Disclosed is a method for recovering from the failure of a gateway server. In some embodiment, the method includes: receiving, at a backup gateway server, a message transmitted from a client, the message comprising a network resource previously allocated to the client by the gateway server that failed; determining whether the network resource is free; and transmitting, from the backup gateway server to the client, an acknowledgment indicating that the client may continue using the network resource in response to a determination that the network resource is free. | 01-26-2012 |
20120030503 | System and Method for Providing High Availability for Distributed Application - A system and method is provided for ensuring high availability for a distributed application. A management object manages multiple scenarios defined for protection units associated with a distributed application. The management object may coordinate various operations performed at the protection units based on management object configuration information. | 02-02-2012 |
20120030504 | HIGH RELIABILITY COMPUTER SYSTEM AND ITS CONFIGURATION METHOD | 02-02-2012 |
20120036391 | METHOD, SYSTEM, AND APPARATUS FOR NETWORK DEVICE TO ACCESS PACKET SWITCHED NETWORK - A method for a network device to access a packet switched network is applied to a system in which the network device accesses the packet switched network by connecting to PEs in an active-standby mode. The method includes: an active PE and a standby PE each sends a fault detection message to the network device through an interface connected to the network device; the active PE sets the state of the interface “up” and advertises a route to a remote PE if a fault detection response returned by the network device is received through the interface within a preset period; otherwise, the active PE sets the state of the interface to “down” , and withdraws the advertised route; and the standby PE sets the state of the interface to “up” and advertises another route to the remote PE after receiving a fault detection response through the interface connected to the network device. | 02-09-2012 |
20120036392 | FAULT DETECTION AND CORRECTION FOR SINGLE AND MULTIPLE MEDIA PLAYERS CONNECTED TO ELECTRONIC DISPLAYS, AND RELATED DEVICES, METHODS AND SYSTEMS - Systems, devices, software, hardware and networks adapted and arranged for monitoring and correcting faults in networked media player systems that include electronic displays are provided. After detection or notification of a fault in at least one networked media player in a network of at least two, or N, media players operationally connected to electronic displays, the invention provides an alternate source of signal to the affected display. In some preferred embodiments, the invention utilizes at least one additional, or N+1, media player as a backup to substitute for the failed media player. Reconfiguration of the faulted media player by means of the N+1 backup networked media player advantageously increases the reliability and efficiency of ongoing maintenance of digital visual systems operating in commercial and other environments. | 02-09-2012 |
20120036393 | FAILURE RECOVERY METHOD, FAILURE RECOVERY PROGRAM AND MANAGEMENT SERVER - In a computer system including server apparatuses such as an active server and a standby server connected to a storage apparatus, when the active server fails, a management server changes over connection to the storage apparatus from the active server to standby server to thereby hand over operation to the standby server. The management server refers to a fail-over strategy table in which apparatus information of the server apparatuses is associated with fail-over methods to select fail-over strategy in consideration of apparatus information of the active and standby servers. | 02-09-2012 |
20120042196 | MANAGEMENT OF A DISTRIBUTED COMPUTING SYSTEM THROUGH REPLICATION OF WRITE AHEAD LOGS - Several methods and a system of a replicated service for write ahead logs are disclosed. In one embodiment, a method includes persisting a state of a distributed system through a write ahead log (WAL) interface. The method also includes maintaining a set of replicas of a WAL through a consensus protocol. In addition, the method includes providing a set of mechanisms for at least one of detection and a recovery from a hardware failure. The method further includes recovering a persistent state of a set of applications. In addition, the method includes maintaining the persistent state across a set of nodes through the hardware failover. In one embodiment, the system may include a WAL interface to persist a state of a distributed system. The system may also include a WAL replication servlet to maintain and/or recover a set of replicas of a WAL. | 02-16-2012 |
20120047394 | HIGH-AVAILABILITY COMPUTER CLUSTER WITH FAILOVER SUPPORT BASED ON A RESOURCE MAP - Embodiments of the invention relate to handling failures in a cluster of computer resources. The resources are represented as nodes in a dependency graph in which some nodes are articulation points and the removal of any articulation point due to a resource failure results in a disconnected graph. The embodiments perform a failover when a resource corresponding to an articulation point fails. The failover is to a local resource if the failed resource does not affect all local resources. The failover is to a remote resource if no local resource can meet all resource requirements of the failed resource, and to a remote resource running in a degraded mode if the remote resource cannot meet all of the requirements. | 02-23-2012 |
20120047395 | CONTROL METHOD FOR INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING SYSTEM, AND PROGRAM - An information processing system including a plurality of server apparatuses coupled to one another, wherein failover is executed. A management server is coupled to the server apparatuses, and is configured to, when detecting occurrence of a failure in an active server apparatus, execute failover from the active server apparatus to a standby server apparatus after turning on a power supply of the standby server apparatus. The management server is enabled to acquire information on the standby server apparatus after turning on the power supply of the standby server apparatus, turn off the power supply of the standby server apparatus after acquiring the information, and, based on the acquired information, judge whether failover to the standby server apparatus can be executed. | 02-23-2012 |
20120060049 | SYSTEM AND METHOD FOR REMOVING A STORAGE SERVER IN A DISTRIBUTED COLUMN CHUNK DATA STORE - Assuring recovery from failure of a storage server in a distributed column chunk data store of operably coupled storage servers, includes: partitioning a data table into chunks; implementing a distribution scheme with a specified level of redundancy for recovery of one or more failed servers among multiple storage servers; distributing the column chunks according to the distribution scheme; calculating column chunk parity; storing the calculated column chunk parity; managing metadata for the column chunk data store; and updating the metadata for distributing the column chunks among remaining storage servers upon receiving an indication to remove a storage serve. | 03-08-2012 |
20120066543 | AUTONOMOUS PROPAGATION OF VIRTUAL INPUT/OUTPUT (VIO) OPERATION(S) TO SECOND VIO SERVER (VIOS) DUE TO A DETECTED ERROR CONDITION AT A FIRST VIOS - A method provides input/output (I/O) redundancy within a data processing system having (a) a client logical partition (LPAR) that generates and consumes I/O requests, (b) a plurality of virtual input/output servers (VIOS) that are communicatively inter-connected with each other to form a VIOS cluster and which include virtual I/O adapters for connecting to a fabric that provides access to a block storage. In one embodiment, a first VIOS receives an I/O request from the client LPAR. The first VIOS detects that a problem exists with a fabric connection to the block storage, and the first VIOS responds to the detected connection problem by autonomously propagating the I/O request to a second VIOS to which the first VIOS is connected. Forwarding of the I/O request to the block storage is subsequently completed by the second VIOS. | 03-15-2012 |
20120079311 | STORAGE PROCESSING DEVICE AND FAILOVER CONTROL METHOD - A storage processing device to which a storage medium is connectable configures a failover system together with a different storage processing device. The storage processing device sets information obtained from predetermined random information as identification information of the storage medium, at the start to configure the failover system. | 03-29-2012 |
20120084597 | COMPUTER SYSTEM AND DATA PROCESSING METHOD FOR COMPUTER SYSTEM - A plurality of computers to execute jobs, a management computer to manage the execution of jobs and the disposition of data in the computers and a storage device storing data are interconnected via a network. The management program for the management computer divides the data into distributed data according to hint information and distributively disposes the distributed data and their replicas in memory storages allocated in memories of the computers. The computers execute the job using the distributed data allocated to their own memory. In the event of a fault in any of the computers, the management computer requests computers having the replicas of those distributed data disposed in the faulted computer to re-execute the job. | 04-05-2012 |
20120084598 | System and Method for Providing Total Real-Time Redundancy for a Plurality of Client-Server Systems - An automated and scalable system for total real-time redundancy of a plurality of client-server systems, wherein, data is replicated through a network connection and operationally located on a virtual machine that substitutes for a failed client-server system, wherein the virtual machine is activated and installed on the cloud computing environment. Monitoring applications are installed on both the client-server systems and the cloud computing environment. System components are identified, a network connection is initiated, a heartbeat is established, data replication is automated, system failure is detected, failover is initiated, and subsequent client-server restoration is automated. | 04-05-2012 |
20120089863 | FAILOVER SYSTEM, STORAGE PROCESSING DEVICE AND FAILOVER CONTROL METHOD - A NAS as a main machine transmits a search packet for searching for a NAS as a new backup machine when a response from a NAS as a backup machine is not received. Each of NAS receiving the search packet transmits a device information packet to the NAS as the main machine. The device information packet includes a version number of failover function and storage capacity information. The NAS as the main machine selects a NAS as the new backup machine based on the version number and the storage capacity information included in the device information packet. | 04-12-2012 |
20120096304 | Providing Unsolicited Global Disconnect Requests to Users of Storage - A mechanism is provided in a storage control unit in a data processing system for providing unsolicited global disconnect requests to users. The mechanism stores lock control data in the storage control unit. The storage control unit allocates its resources into a plurality of clusters. Responsive to a given user connecting to a given partition that is for a logical subsystem resident on a first cluster within the plurality of clusters, the mechanism sends reflected partition information from the first cluster to a second cluster within the plurality of clusters. Responsive to the first cluster experiencing a failure condition, the mechanism moves control data from one or more logical subsystems from the first cluster to the second cluster and for each logical subsystem that moved from the first logical subsystem to the second logical subsystem and that has reflected partition information, presents unsolicited status to one or more users. | 04-19-2012 |
20120102355 | CONSISTENT MESSAGING WITH REPLICATION - A messaging entity configured in a memory of first node of a plurality communicatively coupled nodes is disclosed. The nodes are included in a distributed computing system. The messaging entity is configured to operate as a secondary messaging entity in a messaging server for the plurality communicatively coupled nodes. The messaging entity is communicatively couple to a primary messaging entity configured in a memory of a second node of the plurality of nodes. The primary messaging entity is configured to store a message; store a copy of the message. Also, the messaging entity is configured to be promoted to a new primary messaging entity in the event of failure of the primary messaging entity. | 04-26-2012 |
20120110372 | RELIABLE MESSAGING USING REDUNDANT MESSAGE STREAMS IN A HIGH SPEED, LOW LATENCY DATA COMMUNICATIONS ENVIRONMENT - A method includes receiving active application messages that are part of an active message stream in a subscribing client device from an active feed adapter. Each active application message is characterized by an active source stream identifier, an active source stream sequence number, and an active message sequence number. The method includes receiving, in response to a failover from the active feed adapter to a backup feed adapter, backup application messages in the subscribing client device from the backup feed adapter. Each backup application message is characterized by a backup source stream identifier, a backup source stream sequence number, and a backup message sequence number. The method includes administering, by the subscribing client device, the backup application messages in dependence upon the active stream source identifier, the active stream source sequence number, the backup stream source identifier, and the backup stream source sequence number. | 05-03-2012 |
20120110373 | COMMUNICATION NETWORK AND METHOD FOR SAFETY-RELATED COMMUNICATION IN TUNNEL AND MINING STRUCTURES - A communication network in an underground system comprises a ring network computers which is connected to an aboveground central system computer unit, each computer having an overview of the overall structure of the ring network and an allocated network status. Plural network computers are configured to, in the event of a connection interruption between networked nodes, seek an alternative communication path in order to maintain the communications. Plural network computers are coupled to at least one sensor in order to pick up information relating to the environment and are configured to pass it on to other network computers of the ring network and/or to the aboveground central system. In normal operation, the network computers pass on current information relating to the environment to the aboveground central system and to other network computers. In a network island arising as a result of one or more connection interruptions, the network status of a multiplicity of the network computers changes from normal operation to emergency operation. | 05-03-2012 |
20120117416 | METHOD AND SYSTEM FOR PROCESS CONTROL NETWORK MIGRATION - A method includes disconnecting a first component from a first network. The first component is redundant to a second component and operates in a secondary or passive redundancy mode. The second component operates in a primary or active redundancy mode and is coupled to the first network. The method also includes updating at least one of hardware and software on the first component to allow the first component to communicate on a second network. The method further includes connecting the updated first component to the second network and synchronizing data between the updated first component on the second network and the second component on the first network. In addition, the method includes switching the updated first component from the secondary redundancy mode to the primary redundancy mode. | 05-10-2012 |
20120124413 | METHOD AND SYSTEM FOR NETWORK ELEMENT SERVICE RECOVERY - A method and system for network element recovery are provided. In one form, frontend servers intelligently proxy error or unavailability messages returned by backend servers and simulate frontend server failure. In at least one form, the frontend server also includes intelligence or logic to determine that directing the client to recover service to an alternate system or site would assure better service availability, reliability, and/or quality-of-experience for the client. | 05-17-2012 |
20120131376 | METHOD AND SYSTEM FOR CELL RECOVERY IN TELECOMMUNICATION NETWORKS - A method and system that helps to ensure that any cell crash (i.e., an involuntarily action occurring as a result of a software bug or malfunction) is localized to a single cell on a single modem board that supports multi-cell configuration. In this regard, the control plane and the remaining cells that are configured on the modem board should remain operational. Further the operator should be able to choose to take corrective action (i.e., reboot, reconfigure, delete, or create) with regard to a cell on the modem board without impacting the operations of the other configured cells. | 05-24-2012 |
20120131377 | Support for Virtualized Unified Communications Clients When Host Server Connectivity is Lost - Techniques are provided for establishing a Virtual Desktop Interface (VDI) connection at a virtual desktop thin client (VDTC) device, between a VDI client in the VDTC device and a VDI server in a hosted virtual desktop server (HVDS). A unified communications (UC) control connection is established between a UC protocol stack on the VDTC device and a primary call agent, where the UC control connection is configured to allow the UC protocol stack to register with the primary call agent, and to send or receive commands from the primary call agent that are based on signals from a UC control application running on the HVDS. A UC control backup application is started on the virtual desktop thin client device in a standby mode that is configured to switch to an active mode in response to a failure to establish or maintain the UC control connection, or a failure to establish or maintain the VDI connection. A user interface is launched on the virtual desktop thin client device that is configured to perform UC backup functions. | 05-24-2012 |
20120131378 | MESSAGE SYNCHRONIZATION METHOD, APPARATUS AND SYSTEM - Embodiments of the present invention relate to a message synchronization method, apparatus and system. The method includes: obtaining a first sending time stamp transmitted on a main link and a second sending time stamp transmitted on a backup link respectively; calculating to obtain a time difference according to the first sending time stamp and the second sending time stamp; adding bytes to the first message transmitted on the main link and the first message transmitted on the backup link, and form second messages to be transmitted on the main link and the backup link respectively; and sending the second messages to a receiving end on the main link and the backup link respectively. | 05-24-2012 |
20120131379 | COMPUTER SYSTEM AND AVAILABILITY METHOD THEREOF - High availability computer system and fault correction method. If a fault occurs in the current-system physical device allocated to the current-system virtual device of the virtual server, the virtualization mechanism of the physical server configures, for the standby-system virtual device of the virtual server, the standby-system physical device, as a physical device which is used at a high priority, and the virtualization mechanism distributes the request issued from the standby-system virtual device of another virtual server to a standby-system physical device, but, when such a standby-system physical device does not exist, the virtualization mechanism distributes the request to a standby-system physical device configured for high priority usage. | 05-24-2012 |
20120137164 | METHODS AND SYSTEMS FOR FAULT-TOLERANT DISTRIBUTED STREAM PROCESSING - A method of achieving fault tolerance in a distributed stream processing system organized as a directed acyclic graph includes the initial step of managing a stream process within the distributed stream processing system including one or more operators. The one or more operators of the stream process are communicatively associated with one or more downstream operators. The method includes the steps of maintaining one or more data copies of a processing state of the one or more operators until the one or more data copies can be safely discarded, notifying the one or more operators when it is safe to discard at least one of the at least one of the one or more data copies of the processing state; and using an identifier to denote the data copy of the processing state to be safely discarded. | 05-31-2012 |
20120137165 | DISTRIBUTED BLADE SERVER SYSTEM, MANAGEMENT SERVER AND SWITCHING METHOD - A distributed blade server system, a management server and a switching method are provided. The method includes: determining a standby blade of a first blade when it is determined that the first blade is in abnormal operation; delivering, based on an access relationship between a startup card of the first blade and a first storage partition, a first configuration command to a storage system, the first configuration command including information of an access relationship between a startup card of the standby blade and the first storage partition, so that the storage system configures the access relationship between the startup card of the standby blade and the first storage partition; and delivering a startup command to the standby blade. | 05-31-2012 |
20120151248 | REDUCED POWER FAILOVER SYSTEM AND METHOD - Embodiments include a power-efficient failover system and method. In one embodiment, a primary server operating in a normal operating state is configured to dynamically backup device states or transaction logs. A redundant server coupled to the primary server in a failover cluster is operated at a reduced power state. The redundant server dynamically receives the backup from the primary server and is elevated to a normal operating state in response to a failure of the primary server. By enforcing a reduced power state of the redundant server, a failover system provides a desired combination of high power efficiency with low latency. | 06-14-2012 |
20120151249 | PROVIDING TRANSPARENT FAILOVER IN A FILE SYSTEM - A connection state system is described herein that allows a client to resume a connection with a server or a different replacement server by remotely storing client state information in association with a resume key. The system provides a resume key filter operating at the server that facilitates the storing of volatile server state information. The state information can include information such as oplocks, leases granted to a client, and in-flight operations on a file handle. The resume key filter driver sits above the file system, which allows multiple file access protocols to use the filter. Upon a failover event, such as a server going down or losing connectivity to a client, the system can bring up another server or the same server and reestablish state for file handles held by various clients using the resume key filter. | 06-14-2012 |
20120151250 | FAILURE RECOVERY METHOD IN INFORMATION PROCESSING SYSTEM AND INFORMATION PROCESSING SYSTEM - Services are promptly resumed at the time of a failure recovery in an information processing system. Before a first server system | 06-14-2012 |
20120159234 | PROVIDING RESILIENT SERVICES - Described are embodiments directed at providing resilient services using architectures that have a number of failover features including the ability to handle failover of an entire data center. Embodiments include a first server pool at a first data center that provides client communication services. The first server pool is backed up by a second server pool that is located in a different data center. Additionally, the first server pool serves as a backup for the second server pool. The two server pools thus engage in replication of user information that allows each of them to serve as a backup for the other. In the event that one of the data centers fails, requests are rerouted to the backup server pool. | 06-21-2012 |
20120159235 | Systems and Methods for Implementing Connection Mirroring in a Multi-Core System - The present application is directed to systems and methods for providing failover connection mirroring between two or more multi-core devices intermediary between a client and a server. A first multi-core device may receive a hash key of a second multi-core device for mapping packets to cores of the second multi-core device. The first device may identify a core of the second device using (i) the hash key of the second device and (ii) tuple information corresponding to a connection between the client and the server via the first device. The first device may determine that the identified core is not a desired core for providing a failover connection. The first device may modify the tuple information so as to identify the desired core when used with the hash key of the second device. The first device may use the modified tuple information to establish the failover connection. | 06-21-2012 |
20120166865 | Method, Device for Running Internet Protocol Television Service System, and Internet Protocol Television Service System - The present invention discloses a method, apparatus and system for operating an internet protocol television service system. The present invention relates to communication field, and solves the problem of poor quality of service caused by using a cold backup electronic programmer guide (EPG) server or transferring a user to other EPG server. The method includes: a backup electronic programmer guide (EPG) server receiving an obtaining instruction message sent by a service control manager, wherein the obtaining instruction message instructs the backup EPG server to obtain service information of a failed present network EPG server; the backup EPG server obtaining the service information of the failed present network EPG server according to the obtaining instruction message, and sending an obtaining response message to the service control manager after finishing obtaining the service information. | 06-28-2012 |
20120173919 | System and method for creating and maintaining secondary server sites - Disaster Recovery (DR) and High-Availability (HA) are a critical features required by many information technology systems. DR and HA may be accomplished with a remote secondary site that is kept synchronized with a primary site. To reduce the cost of maintaining a secondary site, the data may be split into two subsets wherein only a first subset of data is kept synchronized at the secondary site using a small bandwidth communication link. The second set of data, which is generally much larger, is periodically backed up at a network accessible back-up location. When a disaster occurs, the secondary site may access the most recent back-up of the second set of data. In a maintenance or limited failure situation, the secondary site can directly access the second data set at the primary site. | 07-05-2012 |
20120179932 | TRANSPARENT UPDATE OF ADAPTER FIRMWARE FOR SELF-VIRTUALIZING INPUT/OUTPUT DEVICE - A firmware update process for a self-virtualizing IO resource such as an SRIOV adapter is incorporated into a platform firmware update process to systematically update the resource firmware in a manner that is for the most part transparent to the logical partitions sharing the adapter. In particular, resource firmware associated with a self-virtualizing IO resource is bundled with firmware for at least one adjunct partition associated with that self-virtualizing IO resource within a common firmware image so that, upon restart of the adjunct partition to use the updated firmware image, the resource firmware is also updated, with a logical partition that uses the self-virtualizing IO resource maintained in an active state during the restart, and without requiring the self-virtualizing IO resource to be deconfigured from the logical partition. | 07-12-2012 |
20120198269 | METHOD AND APPARATUS FOR APPLICATION RECOVERY IN A FILE SYSTEM - Embodiments of the invention relate to block layout and block allocation in a file system to support transparency of application processing. At least one copy of an application is replicated in a write affinity region of a secondary server, and at least one copy of the application is replicated in a wide striping region across a cluster file system. When the application is subject to failure, application processing is transferred from the failure location to the write affinity copy. At the same time, the failed application is rebuilt using the wide striping replication of the application. Once the application is rebuilt, processing may return to the failed location employing the rebuilt application. | 08-02-2012 |
20120198270 | FAILBACK TO A PRIMARY COMMUNICATIONS ADAPTER - In some example embodiments, there is a method for failback to a primary communications adapter. The method includes receiving, in a driver for the primary communications adapter and a backup communications adapter, a link up event for the primary communications adapter, wherein the link up event is sent from the primary communications adapter to the driver, and wherein the link up event is triggered by establishing electrical connectivity to the primary communications adapter. The method includes inferring that the primary communications adapter is configured for receiving packets. The method includes setting the backup communications adapter to idle, wherein the backup communications adapter receives packets and drops the packets while idle. The method includes activating the primary communications adapter, wherein the primary communications adapter receives packets and passes the packets up a protocol stack while activated. | 08-02-2012 |
20120204057 | MOBILE ROUTER NETWORK METHOD - A method comprises the steps of providing a plurality of mobile routers; providing a main server for tracking and monitoring the plurality of mobile routers; initially configuring each mobile router of the plurality of mobile routers to communicate with the main server; providing a first linked communication between each mobile router and the main server; registering each mobile router with the main server and uploading configuration information from each mobile router to the main server; assigning each mobile router with a predetermined group; subsequent to the registering and assigning steps, providing a second linked communication between each mobile router and the main server; and operating the main server such that when the second linked communication occurs, the main server reassigns each mobile router to communicate with at least one group server assigned to communicate with the predetermined group. | 08-09-2012 |
20120204058 | Method and Device for Backing up User Information - A method and apparatus for backing up user information are disclosed. The method includes: establishing a plurality of selection switch protocol groups between a same port of a standby service node and ports of a plurality of main service nodes respectively; the standby service node regularly receiving user information of access users from the ports of the plurality of main service nodes, and storing the user information to a main control unit of the standby service node; keeping a detection relation between the same port of the standby service node and the ports of the plurality of main service nodes; the standby service node sending to an interface unit of the standby service node the user information of the access user of that port stored in the main control unit, and according to the selection switch protocol, switching the same port of the standby service node to be main. | 08-09-2012 |
20120216071 | Avoiding Failover Identifier Conflicts - In certain embodiments, a service provided by a production server is facilitated. The production server is associated with a backup server configured to take over if the production server fails. The production server assigned a first identifier. A failover with a potential identity conflict is determined to have occurred. In the failover, the backup server has taken over for the production server and has been assigned the first identifier. A second identifier is assigned to the production server to replace the first identifier that was assigned to the production server in order to avoid the identity conflict. | 08-23-2012 |
20120221887 | Migrating Virtual Machines Among Networked Servers Upon Detection Of Degrading Network Link Operation - Migrating virtual machines among networked servers, the servers coupled for data communications with a data communications network that includes a networking device, where migrating includes: establishing, by a virtual machine management module (‘VMMM’), one or more virtual machines on a particular server; querying, by the VMMM, the networking device for link statistics of a link coupling the network device to the particular server for data communications; determining, by the VMMM in dependence upon the link statistics, whether the link coupling the network device to the particular server is degrading; and if the link coupling the network device to the particular server is degrading, migrating a virtual machine executing on the particular server to a destination server. In some embodiments, migrating occurs is carried out only if non-degrading link is available. If no non-degrading links are available, the network device, rather than the link, may be failing. | 08-30-2012 |
20120226932 | Reducing Data Stream Interruption During Failure of a Firewall Device - A method includes a first firewall device performing a firewall function on a first redundant input data packet and outputting the first redundant input data packet as a first redundant output data packet according to the firewall function. A second firewall device performs only the the same firewall function on a second redundant input data packet and outputs the second redundant input data packet as a second redundant output data packet according to the firewall function. The redundant output data packets are at least substantially similar when the firewall devices are functioning properly. A controller receives the redundant output data packets and transmits at a given time one of the redundant output data packets to a target. The controller transmits the first redundant output data packet to the target while the second device is failed. | 09-06-2012 |
20120233496 | FAULT TOLERANCE IN A PARALLEL DATABASE SYSTEM - Embodiments are directed to establishing a fault tolerant parallel database system and to detecting the health of parallel database services. In an embodiment, a computer system establishes a control node cluster that includes at least one active control node and at least one spare control node. Each node of the control node cluster includes specific functions assumable only by other control nodes. The computer system also establishes a compute node cluster that includes at least one active computing node, at least one spare computing node, at least one active storage node and at least one spare storage node. Each of the computing and storage nodes includes specific functions assumable only by other computing and storage nodes. The computer system detects a failure of an active node and instantiates a corresponding spare node that is configured to perform the functions of the failed active node. | 09-13-2012 |
20120239966 | SYSTEM AND METHOD FOR SESSION RESTORATION AT GEO-REDUNDANT GATEWAYS - A method and system for managing a backup service gateway (SGW) associated with a primary SGW, comprising periodically receiving from the primary SGW at least a portion of corresponding UE session state information, the received portion of session state information being sufficient to enable the secondary SGW to indicate to an inquiring management entity that all user sessions associated with a group of mobile devices supported by the primary SGW are in a live state; and in response to a failure of the primary SGW, assuming management of IP addresses and paths associated with the primary SGW and causing each UE supported by the failed primary SGW to reauthorize itself to the network. | 09-20-2012 |
20120254654 | METHOD AND SYSTEM FOR USING A STANDBY SERVER TO IMPROVE REDUNDANCY IN A DUAL-NODE DATA STORAGE SYSTEM - Methods are provided in which a standby server, a first main server, and a second main server to control shared input/output (I/O) adapters in a storage system are provided. The standby server is in communication with the first main server and the second main server, and the storage system is configured to operate as a dual node active system. The methods include activating the standby server in response to receiving a communication from the first main server of a fail mode of the second main server. Systems and physical computer storage media are also provided. | 10-04-2012 |
20120254655 | BYZANTINE FAULT TOLERANT DYNAMIC QUORUM USING A TRUSTED PLATFORM MODULE - A method implemented in a computer infrastructure having computer executable code tangibly embodied on a computer readable medium. The computer executable code is operable to dynamically adjust quorum requirements for a voting set V of a server cluster, including a plurality of servers, to ensure that a response of the server cluster to a client request remains Byzantine fault tolerant when at least one of: a failed server of the server cluster is replaced with at least one new server, such that a total set S of servers that have ever been members of the server cluster is increased, and an existing server is removed from the voting set V. | 10-04-2012 |
20120260124 | RECOVERY OF A DOCUMENT SERVING ENVIRONMENT - Methods and systems for quickly serving documents are provided. Documents may be served to users, for example, in response to search query inputs. Documents may be individually communicated to a document server individually prior to batching the documents. In such a real-time serving system, serving components may fail. To ensure real-time serving despite the failure, spares are utilized to replace the failing serving components such that the spare can immediately begin receiving documents. The spare can also be synchronized with other serving components to obtain the memory of the failing serving component prior to the failure. | 10-11-2012 |
20120266014 | E-Commerce Failover System and Method - Disclosed is a computerized method, non-transitory machine-readable medium and computer system for ensuring that critical information, such as that allowing an ecommerce customer to activate purchased downloadable software, can be retrieved even if issues occur in the primary distribution center. The method receives a call for data (e.g. key) at a first endpoint associated with a first distribution center, determining that the call for the data cannot be fulfilled at the first endpoint, and shifting the call for the data to a second endpoint associated with a second distribution center. The computer system comprises a first and second distribution center system, including modules for: fulfilling a call for data, or key; determining if a call cannot be fulfilled, determining a plurality of error indications; and redirecting calls originally directed to one of a first computer or a second computer to the other of the first computer and the second computer. | 10-18-2012 |
20120266015 | METHODS AND SYSTEMS FOR AUTOMATICALLY REROUTING LOGICAL CIRCUIT DATA IN A DATA NETWORK - Determining a failure of a logical connection in a logical circuit identified by a logical circuit identifier, the logical circuit comprising variable communication paths in at least one of a first or a second logical telecommunications network and a fixed communication path between the first and second logical telecommunications networks, and the failed logical connection being between the first and second logical telecommunications networks; identifying a logical failover circuit comprising an alternate communication path in a failover network that is separate from the logical circuit, the failover network reserved to provide failover circuits to communicate data rerouted from failed logical circuits, and the logical failover circuit identified by a second logical circuit identifier; renaming the logical circuit identifier of the logical circuit to the second logical circuit identifier of the logical failover circuit; and rerouting the data from the logical circuit to the logical failover circuit without manual intervention. | 10-18-2012 |
20120272094 | REDUCED POWER FAILOVER - Embodiments include a power-efficient failover method. The method includes operating a primary server at a normal operating state in which program code is executed, and dynamically generating a backup of the results of the executed program code while in the normal operating state. The method further includes operating a redundant server at a reduced power state in which less power is consumed than in the normal operating state of the primary server. The workload of the primary server may be assumed according to the backup in response to a failure of the primary server. The power state of the redundant server is managed, including maintaining the redundant server in the reduced power state prior to detecting a failure of the primary server and increasing the power state of the redundant server and assuming the workload of the primary server in response to the failure of the primary server. | 10-25-2012 |
20120278649 | METHOD AND APPARATUS FOR MANAGING COMMUNICATION SERVICES FOR USER ENDPOINT DEVICES - A system that incorporates teachings of the present disclosure may include, for example, an edge device having a controller to receive a Session Initiation Protocol (SIP) message from a user endpoint device (UE) requesting communication services, forward the SIP message to a network element of a Server Office, receive from the network element a first error message indicating communication services at the Server Office are unavailable, replace the first error message with a second error message, the second error message indicating a temporary unavailability of communication services, and transmit the second error message to the UE. Additional embodiments are disclosed. | 11-01-2012 |
20120284556 | COORDINATED DISASTER RECOVERY PRODUCTION TAKEOVER OPERATIONS - For coordinated disaster reovery, a reconciliation process is performed for resolving intersecting and non-intersecting data amongst disaster recovery systems for takeover operations. An ownership synchronization process is coordinated for replica cartridges via the reconciliation process at the disaster recovery systems. The disaster recovery systems continue as a replication target for source systems and as a backup target for local backup applications. | 11-08-2012 |
20120284557 | MECHANISM TO ENABLE AND ENSURE FAILOVER INTEGRITY AND HIGH AVAILABILITY OF BATCH PROCESSING - A method, system and computer program product manages a batch processing job by: partitioning the batch processing job for execution in a plurality of batch execution servers from a cluster of computers; designating one computer from the cluster as a primary command server that oversees and coordinates execution of the batch processing job; selecting a second computer from the cluster to serve as a failover command server; storing an object data grid structure in the primary command server; replicating the object grid structure to create and store a replicated object grid structure in the failover command server; in response to the primary command server failing, restarting, by the failover command server, execution of batch processes from the batch processing job in the plurality of batch execution servers utilizing objects within the replicated object grid structure, and executing the batch processes with processing states at the time of the failover. | 11-08-2012 |
20120284558 | APPLICATION RECOVERY IN A FILE SYSTEM - Embodiments of the invention relate to block layout and block allocation in a file system to support transparency of application processing. At least one copy of an application is replicated in a write affinity region of a secondary server, and at least one copy of the application is replicated in a wide striping region across a cluster file system. When the application is subject to failure, application processing is transferred from the failure location to the write affinity copy. At the same time, the failed application is rebuilt using the wide striping replication of the application. Once the application is rebuilt, processing may return to the failed location employing the rebuilt application. | 11-08-2012 |
20120290869 | HITLESS SWITCHOVER FROM ACTIVE TCP APPLICATION TO STANDBY TCP APPLICATION - Embodiments of the invention include a method for maintaining an active-standby relationship between an active control card and a standby control card in a network element. The network element receives a data from a remote peer at the active control card. The network element communicates data from the active TCP module to an active application module in the active control card. The network element communicates synchronization data from the active application module to a standby application module on the standby control card. The network element communicates an application synchronization acknowledgement from the standby application module to the active APP module. The network element communicates an application acknowledgment packet from the active application module to the active TCP module responsive to receiving the application synchronization acknowledgment. The network element then communicates an acknowledgement to the remote peer responsive to the application acknowledgement. | 11-15-2012 |
20120290870 | DEVICE VALIDATION, DISTRESS INDICATION, AND REMEDIATION - A wireless communications device may be configured to perform integrity checking and interrogation with a network entity to isolate a portion of a failed component on the wireless network device for remediation. Once an integrity failure is determined on a component of the device, the device may identify a functionality associated with the component and indicate the failed functionality to the network entity. Both the wireless network device and the network entity may identify the failed functionality and/or failed component using a component-to-functionality map. After receiving an indication of an integrity failure at the device, the network entity may determine that one or more additional iterations of integrity checking may be performed at the device to narrow the scope of the integrity failure on the failed component. Once the integrity failure is isolated, the network entity may remediate a portion of the failed component on the wireless communications device. | 11-15-2012 |
20120290871 | METHOD OF CHANGING OVER FROM A PRIMARY HSS TO A BACKUP HSS IN AN IP NETWORK - A method is provided for changing over from a primary home subscriber server (HSS) to a backup HSS in an IP network, said network having a plurality of call server control function (CSCF) (or application) servers, in which, after detection of a loss of connection between one of said CSCF (or application) servers and a primary HSS to which it is normally connected, the CSCF (or application) server connects itself to a backup HSS. The method also comprises the following steps: a) a predetermined broadcast device is informed of said loss of connection with said HSS; b) said broadcast device sends a predetermined fault message at least to the other CSCF (or application) servers that are normally connected to said primary HSS, said message containing the reference of said primary HSS; and c) said other CSCF (or application) servers connect themselves to said backup HSS. | 11-15-2012 |
20120290872 | MACHINE-TO-MACHINE PLATFORM SERVICE PROCESSING METHOD AND MACHINE-TO-MACHINE PLATFORM - A method for processing Machine-to-Machine (M2M) platform services and a M2M platform are disclosed. The method comprises: receiving a service request sent by a terminal; selecting a corresponding application according to capacity required by the service request; and forwarding the service request to the corresponding application, and feeding back a response result of the application to the terminal. The method for processing M2M platform services and the M2M platform in accordance with the present implement a platform for providing a variety applications for users. | 11-15-2012 |
20120297238 | CROSS-CLOUD COMPUTING FOR CAPACITY MANAGEMENT AND DISASTER RECOVERY - A cloud migration system is described herein that provides capacity management and disaster recovery by detecting peak load conditions and automatically moving computing to another computing resource (and back) and by providing computing across two or more clouds and moving completely to one in the case of a disaster at one site. The system monitors loads within a datacenter and detects a threshold that indicates that the current load is nearing the datacenter's capacity. Upon detecting that the threshold will be reached, the system facilitates an orderly move of at least some datacenter load to another datacenter or cloud-based resources. The system can also be used as a disaster recovery architecture at a datacenter/network level to manage fast workload transition in case of disaster. Thus, the system allows enterprises to build smaller and more efficient datacenters that leverage other resources for rare extra loads. | 11-22-2012 |
20120297239 | Local Protection Method of Ethernet Tunnel and Sharing Node of Work Sections of Protection Domain - The invention discloses a segment protection method for an Ethernet tunnel, there are two segment protection domains with a shared link in a Provider Backbone Bridge-Traffic Engineering (PBB-TE) network, and at least one of the two segment protection domains works in a non-revertive mode, and when a simultaneous failures of working segments of two segment protection domains recover or a failure of the shared node of the working segments recovers, the shared node of the working segments switches all the FDB entries of bidirectional ESPs of all the TESIs protected by the protection domains to standby entries, and after switching, out ports of the FDB entries of the ESPs are the ports connecting the shared segment. The invention also discloses a shared node of working segments of protection domains correspondingly. The invention is able to ensure that the bidirectional ESPs are co-routed after the failure recovering. | 11-22-2012 |
20120297240 | Virtual Application Delivery Chassis System - A method for electing a master blade in a virtual application distribution chassis (VADC), includes: sending by each blade a VADC message to each of the other blades; determining by each blade that the VADC message was not received from the master blade within a predetermined period of time; in response, sending a master claim message including a blade priority by each blade to the other blades; determining by each blade whether any of the blade priorities obtained from the received master claim messages is higher than the blade priority of the receiving blade; in response to determining that none of the blade priorities obtained is higher, setting a status of a given receiving blade to a new master blade; and sending by the given receiving blade a second VADC message to the other blades indicating the status of the new master blade of the given receiving blade. | 11-22-2012 |
20120303998 | METHOD, DISTRIBUTED SYSTEM AND COMPUTER PROGRAM FOR FAILURE RECOVERY - A distributed system ( | 11-29-2012 |
20120311375 | REDIRECTING REQUESTS TO SECONDARY LOCATION DURING TEMPORARY OUTAGE - During an outage at a primary location for an online service that is temporary in duration (e.g. a “temporary outage”), requests are temporarily switched from the primary location to a secondary location for the online service. The temporary outage may be caused by many different reasons (e.g. power outage, planned maintenance, and the like). The secondary location may be configured as read only during the temporary outage such that users are still able to access their data during the temporary without causing changes to be made to the data. The requests to the primary location of the online service are automatically redirected to be handled by the secondary location. When the temporary outage ends, the requests are automatically switched back to the primary location. | 12-06-2012 |
20120311376 | RECOVERY SERVICE LOCATION FOR A SERVICE - A secondary location of a network acts as a recovery network for a primary location of the service. The secondary location is maintained in a warm state that is configured to replace the primary location in a case of a failover. During normal operation, the primary location actively services user load and performs backups that include full backups, incremental backups and transaction logs that are automatically replicated to the secondary location. Information is stored (e.g. time, retry count) that may be used to assist in determining when the backups are restored correctly at the secondary location. The backups are restored and the transaction logs are replayed at the secondary location to reflect changes (content and administrative) that are made to the primary location. After failover to the secondary location, the secondary location becomes the primary location and begins to actively service the user load. | 12-06-2012 |
20120331335 | HIGH AVAILABILITY DATABASE SYSTEMS AND METHODS - Described are systems and methods for communicating between a primary database and a standby database of a high availability data recovery (HADR) system. A plurality of primary partitions in a primary database and a plurality of standby partitions in a standby database are configured to communicate with each other. A transition of the plurality of primary partitions from a first HADR state to a second HADR state is synchronized. | 12-27-2012 |
20120331336 | ADDING INDIVIDUAL DATABASE FAILOVER/SWITCHOVER TO AN EXISTING STORAGE COMPONENT WITH LIMITED IMPACT - High availability architecture that employs a mid-tier proxy server to route client communications to active data store instances in response to failover and switchover. The proxy server includes an active manager client that interfaces to an active manager in each of the backend servers. State information and configuration information are maintained separately and according to semantics consistent with needs of corresponding data, the configuration information changing less frequently and more available, the state information changing more frequently and less available. The active manager indicates to the proxy server which of the data storage instances is the currently the active instance. In the event that the currently active instance is inactive, the proxy server selects a different backend server that currently hosts the active data store instance. Client communications are then routed to the different backend server with minimal or no interruption to the client. | 12-27-2012 |
20130007505 | Automatically Performing Failover Operations With A Load Balancer - A load balancer includes a failover logic unit to identify servers to execute services, generate and store in the load balancer a failover rule and a service rule, and to determine a failure in a first server that executes a first service responsive to a lack of response by the first server to a keepalive message sent by the load balancer to the first server. The load balancer can then perform an operation to cause an automatic failover of the first service to another server based on the failover and service rules. | 01-03-2013 |
20130024719 | SYSTEM AND METHOD FOR PROCESSING NETWORK DATA OF A SERVER - In a system and method for processing network data of a server, the server includes a timer, a switch and a storage system. The server determines whether the storage system includes overtime information of the timer when server is powered on. If the storage system includes the overtime information, the overtime information is deleted. If an operating system is started, a predetermined initial value is written into the timer to start timing, and a first network port and a second network port are disconnected through the switch. If the server works normally, a predetermined reset command is sent to the timer to reset the timer at regular intervals. If the server does not work normally, the first network port and the second network port are connected through the switch. If the timer times out, the overtime information is written into the storage system. | 01-24-2013 |
20130024720 | Creation of Highly Available Pseudo-Clone Standby Servers for Rapid Failover Provisioning - Near clones for a set of targeted computing systems are provided by determining a highest common denominator set of components among the computing systems, producing a pseudo-clone configuration definition, and realizing one or more pseudo-clone computing systems as partially configured backups for the targeted computing systems. Upon a planned failover, actual failure, or quarantine action on a targeted computing system, a difference configuration is determined to complete the provisioning of the pseudo-clone system to serve as a replacement system for the failed or quarantined system. Failure predictions can be used to implement the pseudo-clone just prior to an expected first failure of any of the targeted systems. The system can also interface to an on-demand provisioning management system to effect automated workflows to realize pseudo-clones and replacement systems automatically, as needed. | 01-24-2013 |
20130031403 | Failover Data Replication to a Preferred List of Instances - A method, system, and medium are disclosed for performing transparent failover in a cluster server system. The cluster includes a plurality of servers. In servicing a client request, a primary server replicates session data for the client into memory space of one or more backup servers. The primary server sends a response to the client, wherein the response includes an indication of the one or more backup servers. When the client sends a subsequent request, it includes an indication of the backup servers. If the primary server is unavailable, the cluster determines a recovery server from among the backup servers indicated by the request. The chosen recovery server would then service the request. | 01-31-2013 |
20130031404 | SYSTEM AND METHOD FOR IMPLEMENTING PNRP LOCALITY - A method is provided for a host node in a computer network to determine its coordinates in a d-dimensional network space, comprising discovering an address of a peer node in the network, measuring network latency between the host node and the peer node, determining whether network latency has been measured for at least d+1 peer nodes, where, if network latency has not been measured for at least d+1 peer nodes, estimating the network coordinates of the host node, and where, if network latency has been measured for at least d+1 peer nodes, calculating the network coordinates of the host node using d+1 measured latencies. | 01-31-2013 |
20130031405 | DISASTER RECOVERY APPLIANCE - A disaster recovery appliance is described herein. The disaster recovery appliance is coupled to one or more servers. The disaster recovery appliance continuously receives backup data for each of the one or more servers. When a server fails, the disaster recovery appliance, replaces the failed server. While the failed server is inaccessible, the disaster recovery appliance is able to mimic the functionality of the failed server. In some embodiments, the disaster recovery appliance is able to act as a server in addition to a backup device for the other servers. | 01-31-2013 |
20130036323 | FAULT-TOLERANT REPLICATION ARCHITECTURE - A fault-tolerant replication system includes a first machine running a first hypervisor. A second machine is failure-independent of the first machine. The second machine runs a second hypervisor. A first plurality of virtual machines runs on the first hypervisor. A second plurality of virtual machines runs on the second hypervisor. Each of the virtual machines of the first and second plurality of virtual machines constitutes either a virtual machine replica server of a fault-tolerant replicated state machine or a backup corresponding to a virtual machine replica server of the fault-tolerant replicated state machine. Every backup is embodied on a different machine, of the first and second machines, from its corresponding virtual machine replica server. | 02-07-2013 |
20130036324 | SERVER, SERVER SYSTEM, AND METHOD FOR CONTROLLING RECOVERY FROM A FAILURE - A server includes a monitoring unit that monitors operation states of one or more physical servers in use on which same applications operate, a synchronization unit that synchronizes the data between one of the physical servers in use and one of virtual servers in a standby physical server, and a switching unit that, when the monitoring unit has detected that a failure has occurred in the operation of one of the physical servers in use, that operates a second application of the same applications while referring to a piece of the data on the one of the virtual servers synchronized by the synchronization unit with the data on the one of the physical servers in use, and that switches operation from the one of the virtual servers to the standby physical server. | 02-07-2013 |
20130047024 | VIRTUAL I/O SERVER BANDWIDTH VIA SHARED ETHERNET ADAPTER (SEA) LOAD SHARING IN SEA FAIL-OVER CONFIGURATION - Provided are techniques for configuring a primary shared Ethernet adapter (SEA) and a backup SEA into a failover (F/O) protocol; providing a user interface (UI) for enabling a user to request a SEA load sharing protocol; in response to a user request for a SEA load sharing protocol, verifying that criteria for load sharing are satisfied; setting, by the UI a load sharing mode, comprising: requesting, by the backup SEA to the primary SEA, implementation of the SEA load sharing protocol; responsive to the requesting by the backup SEA, the primary SEA transmit an acknowledgment to the backup SEA and transitions into a sharing state; and responsive to the acknowledgment from the primary SEA, the backup SEA transitions to the sharing state. | 02-21-2013 |
20130047025 | SYSTEM AND METHOD FOR STREAM PROCESSING UTILIZING MULTIPOTENT MORPHOGENIC STEM CELLS - A method, computer program product, and system for de-centralized stream processing is provided. The method may include providing a plurality of processing nodes in a hierarchical genome having a plurality of levels, wherein each of said processing nodes is configured to transmit and receive a stream of data. The method may further include restricting a subset of the plurality of processing nodes from differentiating into a role within each level of the hierarchical genome. The method may also include identifying a failure at one of the processing nodes and replacing the failed node with one of the processing nodes from the restricted subset. | 02-21-2013 |
20130047026 | UPGRADING NETWORK TRAFFIC MANAGEMENT DEVICES WHILE MAINTAINING AVAILABILITY - A method, system, machine-readable storage medium, and apparatus are directed towards upgrading a cluster by bifurcating the cluster into two virtual clusters, an “old” virtual cluster (old active cluster) and a “new” virtual cluster (new standby cluster), and iteratively upgrading members of the old cluster while moving them into the new cluster. While members are added to the new cluster, existing connections and new connections are seamlessly processed by the old cluster. Optionally, state mirroring occurs between the old cluster and the new cluster once the number of members of the old and new clusters are approximately equal. Once a threshold number of members have been transferred to the new cluster, control and processing may be taken over by the new cluster. Transfer of control from the old cluster to the new cluster may be performed by failing over connectivity from the old cluster to the new cluster. | 02-21-2013 |
20130047027 | FAILOVER METHOD THROUGH DISK TAKE OVER AND COMPUTER SYSTEM HAVING FAILOVER FUNCTION - When a primary server executing a task fails in a computer system where a plurality of servers are connected to an external disk device via a network and the servers boot an operation system from the external disk device, task processing is taken over from the primary server to a server that is not executing a task in accordance with the following method. The method for taking over a task includes the steps of detecting that the primary server fails; searching the computer system for a server that has the same hardware configuration as that of the primary server and that is not running a task; enabling the server, searched for as a result of the search, to access the external disk device; and booting the server from the external disk device. | 02-21-2013 |
20130055010 | System and Method for an Integrated Open Network Switch - A device includes a first processing unit and a second processing unit. The first processing unit is configured to execute a performance test on the device. The second processing unit is in communication with the first processing unit, and is configured to migrate an application from the second processing unit to the first processing unit. The second processing unit is further configured to detect a failure of the first processing unit, to migrate the application to a third processing unit in response to the failure of the first processing unit, and to assign a first plurality of ports to the third processing unit in response to the failure of the first processing unit. | 02-28-2013 |
20130067267 | RESOURCE AWARE PLACEMENT OF APPLICATIONS IN CLUSTERS - Placing an application on a node in a cluster. A method includes detecting an unexpected event indicating that an application should be placed on a node in the cluster. Real time information about resource utilization on one or more nodes in the cluster is received. Based on the real time information, a determination of a node to place the application is made. The application is placed on the determined node. | 03-14-2013 |
20130067268 | INTELLIGENT INTEGRATED NETWORK SECURITY DEVICE FOR HIGH-AVAILABILITY APPLICATIONS - Methods and apparatuses for inspecting packets are provided. A primary security system may be configured for processing packets. The primary security system may be operable to maintain flow information for a group of devices to facilitate processing of the packets. A secondary security system may be designated for processing packets upon a failover event. Flow records may be shared from the primary security system with the secondary security system. | 03-14-2013 |
20130086414 | SYSTEMS AND METHODS RECOVERING FROM THE FAILURE OF A SERVER LOAD BALANCER - The invention provides, in one aspect, a server load balancer (SLB) recovery method that replicates a primary SLB's connection data after the primary SLB experiences a failure, as opposed to before it experiences a failure as is currently done in the known hot stand-by recovery method. In some embodiments, this is made possible by (1) employing a replication agent on each target processing unit (e.g., each processing unit on which a server application runs) and (2) transmitting, from the primary SLB, connection data information (i.e., information comprising a session identifier) to the replication agent running on the target processing unit to which the session is mapped, which replication agent will store the data until it is required to transmit the data to a cold stand-by SLB. | 04-04-2013 |
20130091377 | METHODS AND SYSTEMS FOR AUTOMATICALLY REROUTING LOGICAL CIRCUIT DATA - An example involves selecting a logical failover circuit comprising an alternate communication path for communicating data upon a failure of a dedicated logical circuit connecting a host device to a remote device. When a first logical circuit identifier of the dedicated logical circuit does not match a second logical circuit identifier of the logical failover circuit: the second logical circuit identifier of the logical failover circuit is renamed to identify the logical failover circuit using the first logical circuit identifier when the logical failover circuit is a dedicated logical failover circuit used to communicate only when the dedicated logical circuit fails, and the dedicated logical circuit is renamed to identify the dedicated logical circuit using the second logical circuit identifier when the logical failover circuit is used to communicate regardless of the failure of the dedicated logical circuit. The data is rerouted to the logical failover circuit without manual intervention. | 04-11-2013 |
20130091378 | METHODS AND SYSTEMS FOR AUTOMATICALLY REROUTING LOGICAL CIRCUIT DATA FROM A LOGICAL CIRCUIT FAILURE TO A DEDICATED BACKUP CIRCUIT IN A DATA NETWORK - An example method involves rerouting a logical circuit from a first set of switches to a second set of switches to communicate data between network devices without breaking the logical circuit. The logical circuit includes variable communication paths, and the second set of switches are to form a route associated with the variable communication paths that is not predefined and that is dynamically defined at a time of automatic rerouting. The example method also involves detecting a failure of the logical circuit based on at least one of a committed information rate or a committed burst size having been exceeded. In addition, the data is rerouted from the logical circuit to a logical failover circuit in the data network in response to detecting the failure of the logical circuit. The logical failover circuit includes an alternative communication path to communicate the data. | 04-11-2013 |
20130097456 | Managing Failover Operations On A Cluster Of Computers - Managing failover operations on a cluster of computers, including: identifying, by a failover hold module, a failure to access data storage in the cluster of computers; preventing the execution of all read operations directed to the data storage that were received after the failure to access data storage was identified; executing all write operations directed to the data storage that were received after the failure to access data storage was identified, including writing data to a cache; identifying that a failover to alternative data storage is complete; executing the held read operations, including reading data from the alternative data storage; and copying, from cache to the alternative data storage, the data written to the cache as part of the write operations. | 04-18-2013 |
20130097457 | MANAGING FAILOVER OPERATIONS ON A CLUSTER OF COMPUTERS - Managing failover operations on a cluster of computers, including: identifying, by a failover hold module, a failure to access data storage in the cluster of computers; preventing the execution of all read operations directed to the data storage that were received after the failure to access data storage was identified; executing all write operations directed to the data storage that were received after the failure to access data storage was identified, including writing data to a cache; identifying that a failover to alternative data storage is complete; executing the held read operations, including reading data from the alternative data storage; and copying, from cache to the alternative data storage, the data written to the cache as part of the write operations. | 04-18-2013 |
20130103977 | FAULT TOLERANCE FOR TASKS USING STAGES TO MANAGE DEPENDENCIES - A high availability system has an application server communicatively coupled to one or more client machines through a network utilizing stateless communication sessions. The application server manages concurrent execution of tasks on multiple client machines. A task may be dependent on the execution of another task and the dependencies are managed through stages. The application server utilizes a fault tolerance methodology to determine a failure to any one of the components within the system and to perform remedial measures to preserve the integrity of the system. | 04-25-2013 |
20130111260 | DYNAMIC RESOURCE ALLOCATION IN RECOVER TO CLOUD SANDBOX | 05-02-2013 |
20130111261 | Split brain resistant failover in high availability clusters | 05-02-2013 |
20130111262 | PROVIDING DISASTER RECOVERY FOR A DISTRIBUTED FILESYSTEM | 05-02-2013 |
20130124913 | MANAGEMENT DEVICE AND MANAGEMENT METHOD - A management device includes a memory and a processor coupled to the memory. The processor executes a process including monitoring an operating state of a target device to be managed as a node of a network to be managed, moving a process executed by the target device to another node on the network, when a sign of failure is detected, as a result of the monitoring, and determining, at activation of the target device, whether there is a process having been moved from the target device to another node, and recalling the moved process from a destination node when there is the process having been moved to the another node. | 05-16-2013 |
20130132764 | SYSTEM REDUNDANCY AND SITE RECOVERY - A method may include receiving an order associated with processing a media file and forwarding the order to a resource management system. The method may also include identifying, by the resource management system, tasks associated with fulfilling the order, storing the plurality of tasks and identifying an execution system to execute the tasks. The method may further include forwarding, by the resource management system, the tasks to the execution system. | 05-23-2013 |
20130132765 | Mechanism to Provide Assured Recovery for Distributed Application - A system and method is provided for providing assured recovery for a distributed application. Replica servers associated with the distributed application may be coordinated to perform integrity testing together for the whole distributed application. The replica servers connect to each other in a manner similar to the connection between master servers associated with the distributed application, thereby preventing the replica servers from accessing and/or changing application data on the master servers during integrity testing. | 05-23-2013 |
20130138997 | RACK SYSTEM - A rack system is provided. The rack system includes a first rack apparatus and a second rack apparatus. The first rack apparatus includes multiple first rack internal devices and a first Integrated Management Module (IMM). The first IMM manages the first rack internal devices via a network. The second rack apparatus includes multiple second rack internal devices and a second IMM. The second IMM manages the second rack internal devices via the network. The first IMM and the second IMM are connected via the network and implement a synchronous configuration process. When the second IMM goes abnormal, the first IMM manages the first rack internal devices and the second rack internal devices via the network at the same time. | 05-30-2013 |
20130138998 | METHOD FOR SWITCHING APPLICATION SERVER, MANAGEMENT COMPUTER, AND STORAGE MEDIUM STORING PROGRAM - It is provided a management computer which refers to switching level information including switching patterns to be used at a time of switching the first task to the second application server; sets a level of a degree of safety for each of the switching patterns; refers to a stop time for each first task which is allowed upon switching the first task to the second application server; selects one of the switching patterns having a switching time that is shorter than the stop time of the task requirement information which is set to the first task and having the level of the degree of safety that is highest among the switching patterns of the switching level information; stops the second task of the second application server by the selected one of the switching patterns; and then controls the second application server to provide the first task. | 05-30-2013 |
20130145206 | METHOD AND SYSTEM FOR USING A STANDBY SERVER TO IMPROVE REDUNDANCY IN A DUAL-NODE DATA STORAGE SYSTEM - A standby server, a first main server, and a second main server to control shared input/output (I/O) adapters in a storage system are provided. The standby server is in communication with the first main server and the second main server, and the storage system is configured to operate as a dual node active system. The standby server is activated in response to receiving a communication from the first main server of a fail mode of the second main server. Systems and physical computer storage media are also provided. | 06-06-2013 |
20130151884 | CLOUD DATA STORAGE SYSTEM - A cloud data storage system is provided for multiple clients to access data of files comprising at least one node connecting to a first storage means; at least one namenode module for processing file operations issued from the clients, namenode module issuing data access instructions to access and maintain the metadata on the first storage means; at least one datanode module respectively executing on at least one node, each datanode module functioning to scan and access a second storage means connected thereto; at least one data import module selectively executing on nodes in which datanode module are executing, the data import module scanning a second storage means newly connected to the cloud data storage system and obtaining a corresponding metadata, and executing data migration operation for the data in second storage means without actual physical uploading operation. | 06-13-2013 |
20130151885 | COMPUTER MANAGEMENT APPARATUS, COMPUTER MANAGEMENT SYSTEM AND COMPUTER SYSTEM - A service processor is separated into a first management unit which performs a primitive processing such as an access processing of hardware and a computer management device which performs a complex processing such as a monitoring of the hardware. And the computer management device is implemented a virtual machine which performs a hardware control of the plurality of hardware. Thereby, the plurality of service processors is realized by a small number of hardware. | 06-13-2013 |
20130159762 | CONTAINER SYSTEM AND MONITORING METHOD FOR CONTAINER SYSTEM - A container system and a monitoring method for the container system are provided. The container system includes a plurality of servers and a master server node. The servers are arranged in N areas. The master server node is coupled to the servers. The master server node selects one of a plurality of servers in an i | 06-20-2013 |
20130166943 | Method And Apparatus For Energy Efficient Distributed And Elastic Load Balancing - Various embodiments provide a method and apparatus of providing a load balancing configuration that adapts to the overall load and scales the power consumption with the load to improve energy efficiency and scalability. The energy efficient distributed and elastic load balancing architecture includes a collection of multi-tiered servers organized as a tree structure. The handling of incoming service requests is distributed amongst a number of the servers. Each server in the virtual load distribution tree accepts handles incoming service requests based on its own load. Once a predetermined loading on the receiving server has been reached, the receiving server passes the incoming requests to one or more of its children servers. | 06-27-2013 |
20130185588 | QUERY EXECUTION AND OPTIMIZATION WITH AUTONOMIC ERROR RECOVERY FROM NETWORK FAILURES IN A PARALLEL COMPUTER SYSTEM WITH MULTIPLE NETWORKS - A database query execution monitor determines if a network error or low performance condition exists and then where possible modifies the query. The query execution monitor then determines an alternate query execution plan to continue execution of the query. The query optimizer can re-optimize the query to use a different network or node. Thus, the query execution monitor allows autonomic error recovery for network failures using an alternate query execution. The alternate query execution could also be determined at the initial optimization time and then this alternate plan used to execute a query in the case of a particular network failure. | 07-18-2013 |
20130198559 | VIRTUAL RECOVERY SERVER - A virtual recovery server is described herein. The virtual recovery server is a software implementation on a storage server which generates a virtual server to replace a physical server when the physical server becomes inaccessible. While the physical server is inaccessible, the virtual recovery server is able to mimic the actions and data contained on the physical server. Thus, when users attempt to access an application or data that is on the physical server, they will not experience an interruption and will continue to access the information as if the physical server were up and running. The virtual recovery server is able to run for up to a number of days. When a new or repaired physical server is available the virtual recovery server is deleted after the data acquired by the virtual server is transmitted to the new physical server. | 08-01-2013 |
20130205161 | SYSTEMS AND METHODS OF PROVIDING HIGH AVAILABILITY OF TELECOMMUNICATIONS SYSTEMS AND DEVICES - Systems and methods of providing high availability of telecommunications systems and devices in a telecommunications network. A telecommunications device is deployed in a high availability configuration that includes two or more peer device platforms, in which each peer device platform can operate in either an active mode or a standby mode. Each peer device platform includes a device health monitoring component and a rules engine. By detecting one or more failures and/or faults associated with the peer device platforms using the respective device health monitoring components, and generating, using the rules engine, a health count for each peer device platform based on the detected failures/faults and one or more predetermined rules, failover decisions can be made based on a comparison of the health counts for the respective peer device platforms, while reducing the impact on the telecommunications network and providing an increased level of user control over the failover decisions. | 08-08-2013 |
20130205162 | REDUNDANT COMPUTER CONTROL METHOD AND DEVICE - Disclosed is a non-transitory computer-readable medium storing a program, which causes a computer to execute a sequence of processing. The sequence of processing includes receiving status information by a second server device from a client device, the status information being collected by the client device, and including a status of a first server device and statuses of one or more standby servers configured to operate when the first server device fails, and causing the second server device to operate, when the status information indicates a predetermined first status, as at least one of the first server device and the one or more standby servers in a failure status. | 08-08-2013 |
20130205163 | COMMUNICATING IN A COMPUTER ENVIRONMENT - Communicating in a peer-to-peer computer environment. A portion of a communication is received from a first user device at a relay peer, wherein the relay peer is one of a list of potential peers and wherein the first user device and a second user device have disparate CPU power and bandwidth capabilities. The portion of the communication is transcoded to comprise a base layer and an enhanced layer. In one embodiment, transcoding encompasses changing the resolution of the communication. The base layer of the portion of the communication is sent to the second user device from the relay peer. The enhanced layer of the portion of the communication is selectively sent to the second user device depending upon a set of capabilities of the second user device. | 08-08-2013 |
20130212423 | Match Server for a Financial Exchange Having Fault Tolerant Operation - Fault tolerant operation is disclosed for a primary match server of a financial exchange using an active copy-cat instance, a.k.a. backup match server, that mirrors operations in the primary match server, but only after those operations have successfully completed in the primary match server. Fault tolerant logic monitors inputs and outputs of the primary match server and gates those inputs to the backup match server once a given input has been processed. The outputs of the backup match server are then compared with the outputs of the primary match server to ensure correct operation. The disclosed embodiments further relate to fault tolerant failover mechanism allowing the backup match server to take over for the primary match server in a fault situation wherein the primary and backup match servers are loosely coupled, i.e. they need not be aware that they are operating in a fault tolerant environment. | 08-15-2013 |
20130212424 | COMPUTER SYSTEM AND BOOT CONTROL METHOD - When a primary computer is taken over to a secondary computer in a redundancy configuration computer system where booting is performed via a storage area network (SAN), a management server delivers an information collecting/setting program to the secondary computer before the user's operating system of the secondary computer is started. This program assigns a unique ID (World Wide Name), assigned to the fibre channel port of the secondary computer to allow a software image to be taken over from the primary computer to the secondary computer. | 08-15-2013 |
20130227339 | Failover Processing - A method of providing failover processing between a first element and a second element in a data communications network, the method comprising configuring a first channel and a second channel between the first and second elements, the first and second channels comprising different physical data paths, receiving at the first element, via the first channel, first data signals representative of functioning statuses of the second element, the first channel being configured to allow a non-optimal, partly functioning status of the second element to be communicated to the first element; and receiving at the first element, via the second channel, second data signals representative of functioning statuses of the second element, the second channel being configured to allow a failed functioning status of the second element to be communicated to the first element; and conducting failover processing based on both the first and second data signals. | 08-29-2013 |
20130238926 | Fault Detection And Correction For Single And Multiple Media Players Connected To Electronic Displays, And Related Devices, Methods And Systems - Systems, devices, software, hardware and networks adapted and arranged for monitoring and correcting faults in networked media player systems that include electronic displays are provided. After detection or notification of a fault in at least one networked media player in a network of at least two, or N, media players operationally connected to electronic displays, the invention provides an alternate source of signal to the affected display. In some preferred embodiments, the invention utilizes at least one additional, or N+1, media player as a backup to substitute for the failed media player. Reconfiguration of the faulted media player by means of the N+1 backup networked media player advantageously increases the reliability and efficiency of ongoing maintenance of digital visual systems operating in commercial and other environments. | 09-12-2013 |
20130254588 | STANDBY SYSTEM DEVICE, A CONTROL METHOD, AND A PROGRAM THEREOF - A standby system device | 09-26-2013 |
20130262915 | SYSTEMS AND METHODS FOR OPEN AND EXTENSIBLE INTEGRATION OF MANAGEMENT DOMAINS IN COMPUTATION AND ORCHESTRATION OF RESOURCE PLACEMENT - An aspect of this invention is a method that includes evaluating a computing environment by performing auditing of a fault tolerance ability of the computing environment to tolerate each of a plurality of failure scenarios; constructing a failover plan for each of the plurality of scenarios; identifying one or more physical resource limitations which constrain the fault tolerance ability; and identifying one or more physical resources to be added to the computing environment to tolerate each of the plurality of failure scenarios. | 10-03-2013 |
20130262916 | CLUSTER MONITOR, METHOD FOR MONITORING A CLUSTER, AND COMPUTER-READABLE RECORDING MEDIUM - A cluster monitor ( | 10-03-2013 |
20130262917 | REDUNDANT SYSTEM CONTROL METHOD - The redundant system includes a redundant server of a first system and a redundant server of a second system. The redundant servers of the first system and the second system operate in lockstep. When a failure occurs in the redundant server of the second system, the redundant server of the first system separates the redundant server of the second system in which the failure has occurred and continues the operation, and then prepares for restoration to a duplexed operation with a configuration in which the failed part is fallen back. When the preparation is completed, both redundant servers of the first system and the second system start a lockstep operation from initialization processing by synchronous reset, and resume the duplexed operation with the configuration in which the failed part is fallen back. | 10-03-2013 |
20130268800 | METHOD AND SYSTEM FOR CO-EXISTENCE OF LIVE MIGRATION PROTOCOLS AND CLUSTER SERVER FAILOVER PROTOCOLS - A method and system for LPAR migration including creating a profile for a logical partition on a host system comprising one or more LPARs, wherein the profile is associated with a first name. Also, within the profile, a port of a client virtual small computer system interface (SCSI) adapter of the LPAR is mapped to a port of a server virtual SCSI adapter of a virtual input/output server (VIOS) of the host system. The server port of the VIOS is set to accept any port of virtual client SCSI adapters of the one or more LPARS of the host system. Within the VIOS, the server port of the VIOS is mapped to a device name (i.e., LPAR) and to a target device (i.e., a disk of shared storage), for purposes of proper failover implementation of the LPAR, wherein the target device comprises an operating system for the LPAR. | 10-10-2013 |
20130268801 | SERVER MANAGEMENT APPARATUS, SERVER MANAGEMENT METHOD, AND PROGRAM - A server management apparatus monitors activity state of an active server that provides a service to a client(s) via a plurality of switches, instructs a route control apparatus, managing routing for the plurality of switches, to change a packet forwarding route if there is no reply from the active server; and recognizes that the active server is stopped if there is no reply from the active server after a forwarding route is changed and instructs a standby server to provide the service instead of the active server. | 10-10-2013 |
20130305084 | SERVER CONTROL AUTOMATION - Control over servers and partitions within a computer network may be automated to improve response to disaster events within the computer network. For example, a monitoring server may be configured to automatically monitor servers through remote communications sessions. A disaster event may be detected based on information received from the partitions and servers within the network. After a disaster event occurs, the monitoring server may automatically execute a script or take other action to make a backup server or partition available. For example, the monitoring server may stop and deactivate a first partition that has failed, activate a second partition that is a mirror image of the first partition, and start the second partition. | 11-14-2013 |
20130305085 | NETWORK TRAFFIC ROUTING - A service appliance is installed between production servers running service applications and service users. The production servers and their service applications provide services to the service users. In the event that a production server is unable to provide its service to users, the service appliance can transparently intervene to maintain service availability. To maintain transparency to service users and service applications, service users are located on a first network and production servers are located on a second network. The service appliance assumes the addresses of the service users on the second network and the addresses of the production servers on the first network. Thus, the service appliance obtains all network traffic sent between the production server and service users. While the service application is operating correctly, the service appliance forwards network traffic between the two networks using various network layers. | 11-14-2013 |
20130326261 | FAILOVER OF INTERRELATED SERVICES ON MULTIPLE DEVICES - A device may include a network interface for communicating with a failover device, a memory for instructions, and a processor for executing the instructions. The processor may execute the instructions to communicate with the failover device, via the network interface, to fail over the device to the failover device in a cluster by pushing a process on the device to the failover device when a first failover event occurs. The failover device is configured to fail over the device to the failover device by pulling the process on the device on the second device when a second failover event occurs. The device is in the cluster. | 12-05-2013 |
20130326262 | METHODS AND SYSTEMS FOR AUTOMATICALLY REROUTING LOGICAL CIRCUIT DATA FROM A LOGICAL CIRCUIT FAILURE TO A DEDICATED BACKUP CIRCUIT IN A DATA NETWORK - An example method of rerouting data involves rerouting a logical circuit from a first set of switches to a second set of switches to communicate data between network devices without breaking the logical circuit. The logical circuit comprises variable communication paths. The second set of switches is to form a route associated with the variable communication paths that is not predefined and that is dynamically defined at a time of automatic rerouting. The example method also involves rerouting the data from the logical circuit to a logical failover circuit in the data network when the logical circuit fails based on a committed information rate having been exceeded. The logical failover circuit comprises an alternative communication path to communicate the data. | 12-05-2013 |
20130339783 | Recovery of a System for Policy Control and Charging, Said System Having a Redundancy of Policy and Charging Rules Function - A first Policy and Charging Rules Function “PCRF” server for recovery of a Policy and Charging Control “PCC” system. The PCC system also has a second PCRF server previously in charge of controlling an Internet Protocol Connectivity Access Network “IP-CAN” session previously established with a UE, and a PCRF-client. The first PCRF server includes a network interface unit of the first PCRF server arranged for receiving a modification request of the IP-CAN session from the PCRF-client after failure of the second PCRF server which was in active mode. The first PCRF server has a PCRF identifier which is shared with the second PCRF server that has failed. The first PCRF server now in active mode. The modification request requesting new rules for the IP-CAN session, including modification data and excluding access data and supported features for the IP-CAN session. The first PCRF server includes a processing unit of the first PCRF server arranged for determining that the IP-CAN session is unknown, and arranged for submitting a request from the network interface unit of the first PCRF server to the PCRF-client to provide all information that the PCRF-client has regarding the IP-CAN session. The information includes all data required to be sent for the IP-CAN session establishment and synchronization data. A Policy and Charging Rules Function “PCRF”-client for recovery of a Policy and Charging Control “PCC” system. Methods for recovery of a Policy and Charging Control “PCC” system with a first Policy and Charging Rules Function “PCRF” server in standby mode, a second PCRF server in active mode, and a PCRF-client, wherein an IP-CAN session is already established with a UE and controlled by the second PCRF server. A computer program embodied on a computer readable medium for recovery of a Policy and Charging Control “PCC” system. | 12-19-2013 |
20130346789 | BACKUP SIP SERVER FOR THE SURVIVABILITY OF AN ENTERPRISE NETWORK USING SIP - This backup SIP server is configured to detect whether an Internet protocol link is not working, and enable the use of a backup SIP signaling link to the main site via a SIP gateway and a public telephone network when the Internet protocol link is not working. The backup SIP server is configured to transfer SIP signaling information on this backup link; and when receiving a registration request from a terminal of the remote site while the Internet protocol link is not working, register the terminal locally and forward the registration request to the main site via the backup link. The backup SIP server is configured to store policies defining what services, supplied by the main SIP server, are compatible with said backup SIP signaling link, and to alter the content of at least one field in each SIP signaling message addressed to the main SIP server before transferring this SIP signaling message on the backup link; this content being altered according to these policies. | 12-26-2013 |
20140006846 | Two-Tier Failover Service for Data Disaster Recovery | 01-02-2014 |
20140013153 | MANAGING USE OF LEASE RESOURCES ALLOCATED ON FALLOVER IN A HIGH AVAILABILITY COMPUTING ENVIRONMENT - Responsive to a cluster manager for a particular node from among multiple nodes allocating at least one leased resource for a resource group for an application workload on the particular node, on fallover of the resource group from another node to the particular node, setting a timer thread, by the cluster manager for the particular node, to track an amount of time remaining for an initial lease period of the at least one leased resource. Responsive to the timer thread expiring while the resource group is holding the at least one leased resource, maintaining, by the cluster manager for the particular node, the resource group comprising the at least one leased resource for an additional lease period and automatically incurring an additional fee, only if the particular node has the capacity to handle the resource group at a lowest cost from among the nodes. | 01-09-2014 |
20140013154 | Method and system for processing email during an unplanned outage - The method and system of the present invention provides an improved technique for processing email during an unplanned outage. Email messages are redirected from the primary server to a secondary server during an unplanned outage such as, for example, a natural disaster. A notification message is sent to users alerting them that their email messages are available on the secondary server by, for example, Internet access. After the termination of the unplanned outage, email messages received during the unplanned outage are synchronized into the users standard email application. | 01-09-2014 |
20140019799 | METHOD AND SYSTEM OF PROTECTION SWITCHING IN A NETWORK ELEMENT - A method and system protects switching in a network element. At least one data signal at the client entity is received, wherein the signal flow through server entity via the client entity. A configuring protection group on at least one client entity is served by at least two server entity, wherein the protection group includes at least one work entity and at least one protect entity. A plurality of supplement client entities of client entity is created such that at least one of the supplement client entity flows over one server entity and checking the entities for a fault to raise alarm to their respective controllers. The controller includes at least one server layer protection controller and at least one client layer protection controller. | 01-16-2014 |
20140025986 | Providing Replication and Fail-Over as a Network Service in Data Centers - Techniques for providing session level replication and fail-over as a network service include generating a replication rule that replicates network traffic destined for a primary server from an originating server to a network controller and installing said rule in a switch component, identifying flows from the originating server to the primary server, replicating each incoming data packet intended for the primary server to the network controller for replication and forwarding to replica servers, determining said primary server to be in a failed state based on a number of retransmissions of a packet, to selecting one of the replica servers as a fail-over target, and performing a connection level fail-over by installing a redirection flow in the switch component that redirects all packets destined to the primary server to the network controller, which forwards the packets to the replica server and forwards each response from the replica server to said originating server. | 01-23-2014 |
20140025987 | Systems, Methods and Media for Distributing Peer-to-Peer Communications - Systems and methods for distributing peer-to-peer communications are provided herein. Exemplary methods may include masking identification of two or more client nodes on a communications channel of a peer-to-peer communications network by directing peer-to-peer communications of the two or more client nodes through a proxy node, the proxy node including a disinterested client node relative to the two or more client nodes, the disinterested client node providing network resources to the peer-to-peer communications network. | 01-23-2014 |
20140025988 | METHODS AND SYSTEMS FOR AUTOMATICALLY REROUTING LOGICAL CIRCUIT DATA - An example involves identifying a failure of a dedicated logical circuit connecting a host device to a remote device to communicate data that originates and terminates only at the host and remote devices. When a first logical circuit identifier of the dedicated logical circuit does not match a second logical circuit identifier of a logical failover circuit comprising an alternate communication path for communicating the data: the second logical circuit identifier is renamed to identify the logical failover circuit using the first logical circuit identifier when the logical failover circuit is a dedicated logical failover circuit to communicate only when the dedicated logical circuit fails, and the dedicated logical circuit is renamed to identify the dedicated logical circuit using the second logical circuit identifier when the logical failover circuit is to communicate regardless of failure of the dedicated logical circuit. | 01-23-2014 |
20140040658 | Disaster Recovery Framework - A System and method of orchestrating failover operations of servers providing services to an internal computer network includes a DR server configured to execute a control script that performs a failover operation. Information needed to perform the failover operation is stored on the DR server thereby eliminating the need to store agents on each of the application's primary and backup servers. The DR server may provide a centralized location for the maintenance an update of the failover procedures for the internal network's redundant services. A failover operation may be initiated by an authorized user in communication with the internal computer network. | 02-06-2014 |
20140047263 | SYNCHRONOUS LOCAL AND CROSS-SITE FAILOVER IN CLUSTERED STORAGE SYSTEMS - Synchronous local and cross-site switchover and switchback operations of a node in a disaster recovery (DR) group are described. In one embodiment, during switchover, a takeover node receives a failover request and responsively identifies a first partner node in a first cluster and a second partner node in a second cluster. The first partner node and the takeover node form a first high-availability (HA) group and the second partner node and a third partner node in the second cluster form a second HA group. The first and second HA groups form the DR group and share a storage fabric. The takeover node synchronously restores client access requests associated with a failed partner node at the takeover node. | 02-13-2014 |
20140047264 | COMPUTER INFORMATION SYSTEM AND DYNAMIC DISASTER RECOVERY METHOD THEREFOR - The method is performed as a client device and includes receiving a first message that includes a first data usage value. The first message is formatted according to a respective format. After receiving the first message, the method further includes acquiring a data usage template corresponding to the respective format. The method further includes receiving a second message that includes a second data usage value. The second message is formatted according to the respective format. The method further includes parsing the second message according to the data usage template so as to obtain a second data usage value. | 02-13-2014 |
20140089724 | SECURING CRASH DUMP FILES - In a computer storage system, crash dump files are secured without power fencing in a cluster of a plurality of nodes connected to a storage system. Upon an occurrence of a panic of a crashing node and prior to receiving a panic message of the crashing node by a surviving node loading, in the cluster, a capturing node to become active, prior to a totem token being declared lost by the surviving node, for capturing the crash dump files of the crashing node, while manipulating the surviving node to continue to operate under the assumption the power fencing was performed on the crashing node. | 03-27-2014 |
20140095924 | End to End Multicast - IP multicast enabled devices, systems and methods for use on an end-to-end IP multicast-enabled network are disclosed. An IP multicast system, device and method operable on the network includes an IP multicast-engine, and storage for storing instruction sets to instruct the engine to send messages according to a select multicast application. A plurality of devices become members of an IP multicast group such that sending a message to a single multicast address can provide for the concurrent control of, and the delivery of the multicast message to, the devices of the group. Error conditions in a multicast source may be handled by preserving the multicast session resources, and reassigning a multicast source address from a faulty source encoding device to an alternate device. | 04-03-2014 |
20140115379 | INTELLIGENT INTEGRATED NETWORK SECURITY DEVICE FOR HIGH-AVAILABILITY APPLICATIONS - Methods and apparatuses for inspecting packets are provided. A primary security system may be configured for processing packets. The primary security system may be operable to maintain flow information for a group of devices to facilitate processing of the packets. A secondary security system may be designated for processing packets upon a failover event. Flow records may be shared from the primary security system with the secondary security system. | 04-24-2014 |
20140115380 | FAILOVER SYSTEM AND METHOD - One aspect of the present invention provides a system for failover comprising at least one client selectively connectable to one of at least two interconnected server via a network connection. In a normal state, one of the servers is designated a primary server when connected to the client and a remainder of the servers are designated as backup servers when not connected to the client. The at least one client is configured to send messages to the primary server. The servers are configured to process the messages using at least one service that is identical in each of the servers. The services are unaware of whether a server respective to the service is operating as the primary server or the backup server. The servers are further configured to maintain a library, or the like, that indicates whether a server is the primary server or a server is the backup server. The services within each server are to make external calls via its respective library. The library in the primary server is configured to complete the external calls and return results of the external calls to the service in the primary server and to forward results of the external calls to the service in the backup server. The library in the secondary server does not make external calls but simply forwards the results of the external calls, as received from the primary server, to the service in the secondary server when requested to do so by the service in the secondary server. | 04-24-2014 |
20140122918 | Method and Apparatus For Web Based Storage On Demand - Rapid demanding for storage capacity at internet era requires a much flexible and powerful storage infrastructure. Present invention disclosed a type of storage system based a model of centrally controlled distributed scalable virtual machine. In this model, one or more service pools including virtual storage service pool and application service pools can be automatically created to meet the demands for more storage capacity from various applications. Specially this model provide a solid foundation for distributing storage volumes for supporting storage on-demand and sharing with exceptional management capabilities. In addition, this model provides a flexible fault recovery topology beyond the traditional recovery plan. | 05-01-2014 |
20140122919 | STORAGE SYSTEM AND DATA PROCESSING METHOD - The present invention comprises a management computer which exchanges information with storage apparatuses and host computers of primary sites and a secondary site via a management network and which manages a data processing performance and a data processing status of each of the sites, wherein, when any of the primary sites is subject to disaster, the management computer selects a normal primary site for which a data transfer time of a restoration target application is within a recovery time objective as a restoration site which possesses processing performance for restoring the application, and remote-copies restoration copy data, which exists at the secondary site, to the storage apparatus of the primary site selected as the restoration site, via a data network. | 05-01-2014 |
20140122920 | HIGH AVAILABILITY SYSTEM ALLOWING CONDITIONALLY RESERVED COMPUTING RESOURCE USE AND RECLAMATION UPON A FAILOVER - In one embodiment, a method determines a first set of virtual machines and a second set of virtual machines. The first set of virtual machines is associated with a first priority level and the second set of virtual machines is associated with a second priority level. A first set of computing resources and a second set of computing resources are associated with hosts. Upon determining a failure of a host, the method performs: generating a power off request for one or more of the second set of virtual machines powered on the second set of computing resources and generating a power on request for one or more virtual machines from the first set of virtual machines that were powered on the failed host, the power on request powering on the one or more virtual machines from the first set of virtual machines on the second set of computing resources. | 05-01-2014 |
20140129873 | METHODS AND DEVICES FOR DETECTING SERVICE FAILURES AND MAINTAINING COMPUTING SERVICES USING A RESILIENT INTELLIGENT CLIENT COMPUTER - Intelligent client computing devices track and record the changes they make to data, applications, and services. Systems, devices, and computer readable media for detecting service tier failures and maintaining application services provide a resilient client architecture that allows a client application on an intelligent client to automatically detect the unavailability of server tiers or sites and re-route requests and updates to secondary sites to maintain application services at the client tier in a manner that is transparent to a user. The resilient client architecture understands the level of currentness of secondary sites in order to select the best secondary site and to automatically and transparently bring this secondary site up to date to ensure no data updates are missing from the secondary site. | 05-08-2014 |
20140136878 | Scaling Up and Scaling Out of a Server Architecture for Large Scale Real-Time Applications - Scaling up and scaling out of a server architecture for large scale real-time applications is provided. A group of users may be provisioned by assigning them to a server pool and allotting them to a group. Grouped users help to reduce inter-server communication when they are serviced by the same server in the pool. High availability may be provided by choosing a primary server and one or more secondary servers from the pool to ensure that grouped users are serviced by the same server. Operations taken on the primary server are synchronously replicated to secondary servers so that when a primary server fails, a secondary server may be chosen as the primary for the group. Servers for multiple user groups may be load balanced to account for changes in either the number of users or the number of servers in a pool. Multiple pools may be paired for disaster recovery. | 05-15-2014 |
20140136879 | TRANSMISSION APPARATUS AND TRANSMISSION APPARATUS CONTROL METHOD - A first interface board includes a first signal processing unit that performs a predetermined process on a signal. A second interface board includes a second signal processing unit that performs the predetermined process on a signal. When no failure occurs in both interface boards, a switching control unit selects the first interface board. When a failure occurs in the first interface board, the switching control unit selects the second interface board. When there is no failure in both the interface boards and the first interface board does not satisfy a predetermined degradation condition, the electrical power supply control unit supplies electrical power to the first interface board and prohibits the supply of electrical power to the second interface board. When there is no failure in both the interface boards but the predetermined degradation condition is satisfied, the electrical power supply control unit supplies electrical power to both the interface boards. | 05-15-2014 |
20140136880 | NON-DISRUPTIVE FAILOVER OF RDMA CONNECTION - A novel RDMA connection failover technique that minimizes disruption to upper subsystem modules (executed on a computer node), which create requests for data transfer. A new failover virtual layer performs failover of an RDMA connection in error so that the upper subsystem that created a request does not have knowledge of an error (which is recoverable in software and hardware), or of a failure on the RDMA connection due to the error. Since the upper subsystem does not have knowledge of a failure on the RDMA connection or of a performed failover of the RDMA connection, the upper subsystem continues providing requests to the failover virtual layer without interruption, thereby minimizing downtime of the data transfer activity. | 05-15-2014 |
20140136881 | MANAGING FATE-SHARING IN SHARED-MEDIA COMMUNICATION NETWORKS - In one embodiment, a management device receives one or more fate-sharing reports locally generated by one or more corresponding reporting nodes in a shared-media communication network, the fate-sharing reports indicating a degree of localized fate-sharing between one or more pairs of nodes local to the corresponding reporting nodes. The management device may then determine, globally from aggregating the fate-sharing reports, one or more fate-sharing groups indicating sets of nodes having a global degree of fate-sharing within the communication network. As such, the management device may then advertise the fate-sharing groups within the communication network, wherein nodes of the communication network are configured to select a plurality of next-hops that minimizes fate-sharing between the plurality of next-hops. | 05-15-2014 |
20140143590 | METHOD AND APPARATUS FOR SUPPORTING FAILOVER FOR LIVE STREAMING VIDEO - A computer implemented method and apparatus for receiving, at a first media content packager, a request for at least one of an index file of media segments or a media segment; creating, in response to the request for an index file, an index file comprising a plurality of universal resource locators (URLs); and sending, from the media content packager an error message when at least one of: (i) in response to a request for an index file, the created index file is determined to comprise media segments that have previously been sent from a media packager, or (ii) in response to a request for a media segment, at least one of the URLs in the index file references media content that is not available on the first packager, or a media segment is incomplete, wherein sending the error message results in receiving the request at a second media content packager. | 05-22-2014 |
20140143591 | Performing Failover in a Redundancy Group - A method, system, and computer program product for performing failover in a redundancy group, where the redundancy group comprises a plurality of routers including an active router and a standby router, the failover being characterized by zero black hole or significantly reduced black hole conditions versus a conventional failover system. The method comprises the steps of: receiving an incoming message at a switch; sending a request of identification to the plurality of routers to identify a current active router, where the current active router represents a virtual router of the redundancy group; and in response to receiving a reply containing an identification from the current active router within a predetermined time, forwarding the incoming message to the current active router. | 05-22-2014 |
20140143592 | CLIENT BASED HIGH AVAILABILITY METHOD FOR MESSAGE DELIVERY - A message queue (MQ) failover handler receives a message and a configuration file from a client application. The configuration file provides an indication of which of a number of queue managers (QMs) is the first choice for receipt and delivery of the message to a server application. The configuration file also provides an indication of which of the QMs is the second choice for receipt and delivery of the message to the server application, should the first choice of the QMs be unavailable. | 05-22-2014 |
20140149784 | Instance Level Server Application Monitoring, Load Balancing, and Resource Allocation - A system and methodology to monitor system resources for a cluster computer environment and/or an application instance allows user to defined failover policies that take appropriate corrective actions when a predefined threshold is met. An engine comprising failover policies and mechanisms to define resource monitoring, consumption, allocation, and one or more thresholds for a computer server environment to identify capable servers and thereafter automatically transition an application between multiple servers so as to ensure the application is continually operating within the defined metrics. | 05-29-2014 |
20140157042 | LOAD BALANCING AND FAILOVER OF GATEWAY DEVICES - Methods and systems for load balancing and failover among gateway devices are disclosed. One method provides for assigning communication transaction handling to a gateway. The method includes receiving a request for a license from a computing device at a control gateway within a group of gateway devices including a plurality of gateway devices configured to support communication of cryptographically split data. The method also includes assigning communications from the computing device to one of the plurality of gateway devices based on a load balancing algorithm, and routing the communication request to the assigned gateway device. | 06-05-2014 |
20140164817 | DISASTER RECOVER INTERNET PROTOCOL ADDRESS FAILOVER - An approach is provided for internet protocol (IP) address failover. An application on a primary site is assigned a private IP address. This private IP address is accessible within a local network. This private IP address is mapped to a public IP address, which is accessible to users outside the local network. The application is then replicated to a backup site with the same private IP address used to access it on the primary site. In case of a disaster recover event on the primary site, the replicated application can be accessed on the backup site by way of the public IP address. | 06-12-2014 |
20140164818 | Automatic Failover of Nodes of a Middle-Tier Layer - Method and system are provided for automatic failover between multiple nodes in a middle-tier layer between client applications and a back-end service. The method at a first node includes: receiving a request from a client application; determining that the first node cannot service the request successfully; forwarding the request from the first node to a second peer node; receiving a response at the first node on completion of the request by the second peer node; and forwarding the response from the first node to the client application. An original path of communication between the client application and the first node is maintained. The failover of the request from the first node to the second peer node is transparent to the client application. Nodes may be registered in a group, wherein a node in a group has awareness of other nodes in the group. | 06-12-2014 |
20140173329 | CASCADING FAILOVER OF BLADE SERVERS IN A DATA CENTER - Cascading failover of blade servers in a data center implemented by transferring by a system management server a data processing workload from a failing blade server to an initial replacement blade server, with the data processing workload characterized by data processing resource requirements and the initial replacement blade server having data processing resources that do not match the data processing resource requirements; and transferring by the system management server the data processing workload from the initial replacement blade server to a subsequent replacement blade server, where the subsequent replacement blade server has data processing resources that better match the data processing resource requirements than do the data processing resources of the initial replacement blade server. | 06-19-2014 |
20140173330 | Split Brain Detection and Recovery System - The invention provides for split brain detection and recovery in a DAS cluster data storage system through a secondary network interconnection, such as a SAS link, directly between the DAS controllers. In the event of a communication failure detected on the secondary network, the DAS controllers initiate communications over the primary network, such as an Ethernet used for clustering and failover operations, to diagnose the nature of the failure, which may include a crash of a data storage node or loss of a secondary network link. Once the nature of the failure has been determined, the DAS controllers continue to serve all I/O from the surviving nodes to honor high availability. When the failure has been remedied, the DAS controllers restore any local cache memory that has become stale and return to regular I/O operations. | 06-19-2014 |
20140173331 | Adaptive Private Network with Geographically Redundant Network Control Nodes - Systems and techniques are described which improve performance, reliability, and predictability of networks. Geographically diverse network control nodes (NCNs) are provided in an adaptive private network (APN) to provide backup NCN operations in the event of a failure. A primary NCN node in a first geographic location is operated according to a primary state machine at an NCN active state. A client node is operated according to a client state machine. A secondary NCN node in a second geographic location that is geographically remote from the first geographic location is operated according to a secondary state machine at a standby state. The three state machines operating parallel and upon detecting a change in APN state information, the secondary state machine transitions from the standby state to a secondary active NCN state and the secondary NCN node provides APN timing calibration and control to the client node. | 06-19-2014 |
20140173332 | Cascading Failover Of Blade Servers In A Data Center - Cascading failover of blade servers in a data center implemented by transferring by a system management server a data processing workload from a failing blade server to an initial replacement blade server, with the data processing workload characterized by data processing resource requirements and the initial replacement blade server having data processing resources that do not match the data processing resource requirements; and transferring by the system management server the data processing workload from the initial replacement blade server to a subsequent replacement blade server, where the subsequent replacement blade server has data processing resources that better match the data processing resource requirements than do the data processing resources of the initial replacement blade server. | 06-19-2014 |
20140173333 | SEVER AND METHOD FOR HANDLING ERRORS OF PROGRAMS - Programs in a host server are the same as programs in a backup server. When the host server is executing handling items in the host server and a system time of the server exceeds an earliest predefined handling time of the handling items, a handling item with the earliest predefined handling time is determined as in error, and a corresponding program in the backup server is executed. When a system time of the backup server exceeds an earliest predefined time of the corresponding program in the backup server and a handling item with the earliest predefined handling time in the backup server is the same as the handling item which is in error in the server, the backup server is switched as a new host server and the host server is switched as a new backup server. | 06-19-2014 |
20140173334 | ROUTING OF COMMUNICATIONS TO A PLATFORM SERVICE - Systems and methods for routing communications to a platform service are provided. A message including payload data is received. The information in the payload data of the message is examined in order to determine the type of message. The message is then relayed to an appropriate platform service based on the type of message. Some embodiments assign numbers to the packets that make up the message. | 06-19-2014 |
20140173335 | HIGH RELIABILITY REDUNDANT VOTING SYSTEM FOR A SIGNAL RECEIVED BY VOTING PROCESSORS IN A COMMUNICATION SYSTEM - Disclosed are methods and systems for providing improved reliability via redundant voting systems. The voting systems are operable to vote on a signal received by a plurality of base stations in a communication system. A first voting processor transmits a message to the plurality of base stations indicating a return base station to voting processor (BS-VP) multicast address. A plurality of multicast messages addressed to the return multicast address are then received from the plurality of base stations and associated with a particular signal received from a signal source by each of the plurality of base stations. The first voting processor determines a recovered signal, and responsive to the first voting processor determining that the second voting processor is no longer operational, transmits the recovered signal to a voting processor to infrastructure (VP-IN) multicast address associated with an infrastructure device and the first and second voting processors. | 06-19-2014 |
20140181572 | Provide an Appliance Like Test Vehicle for IT Disaster Recovery - A high availability/disaster recovery appliance test vehicle that contains a preconfigured high availability/disaster recovery solution for quick implementation at a test environment. The hardware and software components may be preconfigured with test applications and data and all the necessary networking, SAN and operating system requirements. This single shippable rack can be used to certify a multi site high availability/disaster recovery architecture that supports both local and site failover and site to site data replication. The unit is plug and play which reduces the effort required to begin the evaluation and reduces the number of IT teams that need to be involved. An apparatus and method for implementing the above high availability/disaster recovery vehicle are provided. | 06-26-2014 |
20140201564 | HEALING CLOUD SERVICES DURING UPGRADES - Embodiments described herein are directed to migrating affected services away from a faulted cloud node and to handling faults during an upgrade. In one scenario, a computer system determines that virtual machines running on a first cloud node are in a faulted state. The computer system determines which cloud resources on the first cloud node were allocated to the faulted virtual machine, allocates the determined cloud resources of the first cloud node to a second, different cloud node and re-instantiates the faulted virtual machine on the second, different cloud node using the allocated cloud resources. | 07-17-2014 |
20140208153 | METHOD AND SYSTEM FOR PROVIDING HIGH AVAILABILITY TO COMPUTER APPLICATIONS - A system and method for assigning application specific IP addresses to individual applications. The system may be operable to assign a unique IP address to an application, and alias the application IP address to a NIC IP address on the host where the application is running. In an exemplary embodiment, the system may be further operable to migrate the application IP address to a new host as part of a migration, and alias the application IP address to a NIC in the new host as part of the migration. | 07-24-2014 |
20140215259 | LOGICAL DOMAIN RECOVERY - Recovering logical domain processes from a virtual production environment to a recovery environment by building recovery automation scripts. The process, partially automated and partially guided by an administrative user, captures logic necessary for orderly recovery, but also permits the user to specify certain configuration information for the recovered logical domains. A first step before building recovery script(s) is to execute a capture script on the production environment to retrieve configuration information for the production logical domains. Based on this captured output, a recovery script builder then starts to build one or more recovery script(s). To account for unavoidable inconsistencies between the production and recovery environments, the user is guided through a deterministic process of providing additional information, such as different resource mappings, so that the script builder may further address such differences. | 07-31-2014 |
20140215260 | MEMCACHED SERVER REPLICATION - According to an example, data for a memcached server is replicated to a memcached replication server. Data operations for the memcached server may be filtered for backing up data to the memcached replication server. | 07-31-2014 |
20140215261 | PROVISIONING AND REDUNDANCY FOR RFID MIDDLEWARE SERVERS - The present invention provides for the provisioning and redundancy of RFID middleware servers. Middleware servers can be automatically provisioned and RFID device/middleware server associations can be automatically updated. Some implementations of the invention provide for automatic detection of middleware server malfunctions. Some such implementations provide for automated provisioning and automated updating of RFID device/middleware server associations, whether a middleware server is automatically brought online or is manually replaced. Changes and reassignments of the RFID device populations may be accommodated. | 07-31-2014 |
20140245060 | SYNCHRONIZED FAILOVER FOR ACTIVE-PASSIVE APPLICATIONS - The present invention extends to methods, systems, and computer program products for synchronized active-passive application failover. A data connection to a single data source can be used as a synchronizations point. Interoperating instance side and data source side algorithms coordinate to transition a passive instance to an active instance within a specified period of time when a prior active instance fails. An active-passive controller can operate as an active-active module within an active-active environment to provide active-passive failover to active-passive modules. Application virtual names can be mapped to application instance electronic addresses to assist external modules in establishing application connections to active-passive applications. | 08-28-2014 |
20140250318 | SYSTEM FOR AND METHODS OF PROVIDING COMPUTER TELEPHONY INTEGRATION SERVICE ORIENTED ARCHITECTURE - A system and method for providing a Computer Telephony Integration Service Oriented Architecture is presented. The system and method may include providing one or more computer telephony integration servers in one or more clusters to deliver telephony events between agents and peripheral devices. The events may be solicited or unsolicited. The clusters may be stateless and scalable. The clusters may include dynamic message routing. The systems and methods may implement one or more recovery algorithms if one or more of the telephony integration servers experiences a failure event. | 09-04-2014 |
20140250319 | SYSTEM AND METHOD FOR PROVIDING A COMPUTER STANDBY NODE - An apparatus for providing a computing environment in a computing system includes a first node, a second node, an operations server, and a communication link. The first node is capable of supporting a production computing environment and 5 has a first disk storage. The second node is capable of supporting a second operational computing environment, independent of the production computing environment and has a second disk storage. | 09-04-2014 |
20140250320 | CLUSTER SYSTEM - A cluster system according to the present invention includes an active server and a standby server which have a failover function, and a shared disk. The active server includes a control device configured to operate free of influence from an OS, and a disk input/output device configured to input and output data into and from the shared disk. The control device of the active server includes a communication module configured to communicate with the standby server, and an initialization module configured to, when a failure occurs in the active server, initialize the disk input/output device and notify to the standby server via the communication module. | 09-04-2014 |
20140250321 | DISTRIBUTED BLADE SERVER SYSTEM, MANAGEMENT SERVER AND SWITCHING METHOD - A distributed blade server system, a management server and a switching method are provided. The method includes: determining a standby blade of a first blade when it is determined that the first blade is in abnormal operation; delivering, based on an access relationship between a startup card of the first blade and a first storage partition, a first configuration command to a storage system, the first configuration command including information of an access relationship between a startup card of the standby blade and the first storage partition, so that the storage system configures the access relationship between the startup card of the standby blade and the first storage partition; and delivering a startup command to the standby blade. | 09-04-2014 |
20140258771 | HIGH-AVAILABILITY CLUSTER ARCHITECTURE AND PROTOCOL - Methods and systems are provided for an improved cluster-based network architecture. According to one embodiment, an active connection is established between a first interface of a network device and an enabled interface of a first cluster unit of a high availability (HA) cluster. The HA cluster is configured to provide connectivity between network devices of an internal and external network. A backup connection is established between a second interface of the network device and a disabled interface of a second cluster unit. While the first cluster unit is operational and has connectivity, it receives and processes all traffic originated by the network device that is destined for the external network. Upon determining the first cluster unit has failed or has lost connectivity, then all subsequent traffic originated by the network device that is destined for the external network is directed to the second cluster unit. | 09-11-2014 |
20140258772 | UTILIZING BACKWARD DEFECT INDICATIONS IN Y-CABLE PROTECTION SWITCHING - In accordance with teachings of the present disclosure, a method includes—at a network element communicatively coupled to a y-cable through two transmitters of the network element—transmitting data from a transmitter through the y-cable to a client and withholding transmission from the other transmitter, and determining whether receivers have received a backward defect indicator from the client. The method further includes determining that an interruption has occurred within a transmission media between the network element and the client, based on the determinations of whether the receivers have received the backward defect indicator from the client. The method also includes, based on the determination of the interruption, transmitting data from the other transmitter through the y-cable to the client and withholding transmission from the transmitter. | 09-11-2014 |
20140258773 | Match Server for a Financial Exchange Having Fault Tolerant Operation - Fault tolerant operation is disclosed for a primary match server of a financial exchange using an active copy-cat instance, a.k.a. backup match server, that mirrors operations in the primary match server, but only after those operations have successfully completed in the primary match server. Fault tolerant logic monitors inputs and outputs of the primary match server and gates those inputs to the backup match server once a given input has been processed. The outputs of the backup match server are then compared with the outputs of the primary match server to ensure correct operation. The disclosed embodiments further relate to fault tolerant failover mechanism allowing the backup match server to take over for the primary match server in a fault situation wherein the primary and backup match servers are loosely coupled, i.e. they need not be aware that they are operating in a fault tolerant environment. | 09-11-2014 |
20140258774 | METHODS AND SYSTEMS FOR AUTOMATICALLY TRACKING THE REROUTING OF LOGICAL CIRCUIT DATA IN A DATA NETWORK - An example method involves generating, without manual intervention, a table to store current reroute statistics based on rerouting of data from a logical circuit that has failed to a logical failover circuit in a network. The current reroute statistics include trap data corresponding to the logical circuit. The trap data includes a committed burst size. The logical circuit is identified by a first logical circuit identifier. The logical failover circuit is identified by a second logical circuit identifier. The first and second logical circuit identifiers are renamed until the logical circuit has been restored from failure. The table is updated, without manual intervention, to store updated reroute statistics. The updated reroute statistics include updated trap data corresponding to the logical circuit. The updated reroute statistics are based on a change in status of the logical circuit corresponding to a dropped frame when the committed burst size has been exceeded. | 09-11-2014 |
20140281669 | OpenFlow Controller Master-slave Initialization Protocol - A method for network controller initialization that includes identifying a controller connected to a network as a primary controller that manages switches in the network. One or more other controllers connected to the network are identified as secondary controllers. A failover priority table is created. The failover table indicates an order that the one or more other controllers will replace the controller as the primary controller in the event that the controller enters a failure mode. The failover priority table is broadcast to the switches in the network. | 09-18-2014 |
20140281670 | PROVIDING A BACKUP NETWORK TOPOLOGY WITHOUT SERVICE DISRUPTION - In one embodiment, a primary root node may detect one or more neighboring root nodes based on information received from a first-hop node and may select a backup root node among the neighboring root nodes. Once selected, the backup root node may send the primary root node a networking identification and a corresponding group mesh key which the primary root node may forward to the first-hop nodes to cause the first-hop nodes to migrate to the backup root node when connectivity to the primary root node fails. In addition, the first-hop root nodes may migrate back to the primary root node when connectivity to the primary root node is restored. | 09-18-2014 |
20140281671 | ENHANCED FAILOVER MECHANISM IN A NETWORK VIRTUALIZED ENVIRONMENT - An embodiment of the invention is associated with a virtualized environment that includes a hypervisor, client LPARs, and virtual servers that each has a SEA, wherein one SEA is selected to be primary SEA for connecting an LPAR and specified physical resources. A first SEA of a virtual server sends a call to the hypervisor, and in response the hypervisor enters physical adapter capability information, contained in the call and pertaining to the first SEA, into a table. Further in response to receiving the call, the hypervisor decides whether or not the first SEA of the virtual server should then be the primary SEA. The hypervisor sends a return call indicating its decision to the first SEA. | 09-18-2014 |
20140281672 | PERFORMING NETWORK ACTIVITIES IN A NETWORK - Techniques and systems for performing a network activity within a network. The technique includes assigning one or a plurality of network devices subnets with network devices for performing network activities. Network devices within the assigned network device subnets can be assigned to act as a primary network device and a backup network device. The primary network device can perform the network activity. The backup network devices can monitor the primary network device and continue performing the network activities if the primary network device fails or is rogue. | 09-18-2014 |
20140281673 | HIGH AVAILABILITY SERVER CONFIGURATION - A switch may be configured with multiple zones to provide access to an external storage to certain processing systems. For example, the switch may be configured with two zones, in which a first zone configuration provides access to the external storage for a first processing system and a second zone configuration provides access to the external storage for a second processing system. Thus, the switch may provide high availability of the external storage and allow seamless transition from one computer system to another computer system. | 09-18-2014 |
20140281674 | FAULT TOLERANT SERVER - The virtual computer of the active system includes a memory configured of small regions grouped in a first group and small regions grouped in a second group. When a checkpoint is detected by the checkpoint detection unit, the transfer control unit suspends the virtual computer, copies, to a transfer buffer (not shown), data of the small regions in the first group among the small regions of the memory having been updated after a previous checkpoint, and after inhibiting writing to the small regions in the second group, restarts the virtual computer. Further, the transfer control unit copies data of the small regions, in which writing is inhibited, to the transfer buffer and releases write inhibit, and transfers the data of the small regions, having been copied to the transfer buffer, to the physical computer. | 09-18-2014 |
20140281675 | FLEXIBLE FAILOVER POLICIES IN HIGH AVAILABILITY COMPUTING SYSTEMS - A system for implementing a failover policy includes a cluster infrastructure for managing a plurality of nodes, a high availability infrastructure for providing group and cluster membership services, and a high availability script execution component operative to receive a failover script and at least one failover attribute and operative to produce a failover domain. In addition, a method for determining a target node for a failover comprises executing a failover script that produces a failover domain, the failover domain having an ordered list of nodes, receiving a failover attribute and based on the failover attribute and failover domain, selecting a node upon which to locate a resource. | 09-18-2014 |
20140289552 | FAULT-SPOT LOCATING METHOD, SWITCHING APPARATUS, FAULT-SPOT LOCATING APPARATUS, AND INFORMATION PROCESSING APPARATUS - A fault-spot locating method, comprising at a switching apparatus that includes an interface for connection to a master, which is a controlling object, and a plurality of ports for connection to a slave, which is a controlled object, providing another interface used to output information indicating an operation of the switching apparatus and received data, and causing the switching apparatus to transmit the information via the another interface, using a processor. And causing an apparatus that is capable of obtaining the information transmitted via the another interface to locate a spot where a fault has possibly occurred by using the information to check the received data and the operation of the switching apparatus caused by a command from the master, using the processor. | 09-25-2014 |
20140298080 | INTRA-REALM AAA FALLBACK MECHANISM - There is provided an intra-realm AAA (authentication, authorization and accounting) fallback mechanism, wherein the single global realm may be divided in one or more sub-realms. The thus presented mechanism exemplarily comprises detecting a failure of an authentication server serving at least one authentication client within a first sub-realm of a single-realm authentication system, and routing authentication messages of the at least one authentication client to a fallback authentication server within a second sub-realm of the single-realm authentication system, wherein routing may exemplarily comprise sub-realm based source routing. | 10-02-2014 |
20140298081 | DISTRIBUTED SWITCHING SYSTEM FOR PROGRAMMABLE MULTIMEDIA CONTROLLER - In one embodiment, two or more programmable multimedia controllers are provided a multimedia system that includes a plurality of audio/video (A/V) devices that source or output digital media streams. Each of the programmable multimedia controllers has at least a processing subsystem and a switch capable of switching the digital media streams. Arbitration is conducted among the programmable multimedia controllers to select one of the programmable multimedia controllers as winning the arbitration. Master status is assigned to the one of the programmable multimedia controllers that won the artitration. Subordinate status is assigned to at least one other programmable multimedia controller that did not win the arbitration. It is periodically verified whether the programmable multimedia controller assigned master status is operating. In response to the programmable multimedia controller assigned master status having experienced a failure, master status is reassigned to a programmable multimedia controller that was originally is assigned subordinate status. | 10-02-2014 |
20140298082 | TESTING SERVER, INFORMATION PROCESSING SYSTEM, AND TESTING METHOD - A testing server performs a test to check whether servers properly execute failover. The testing server includes a generation unit, a transmitting unit, a testing unit, a restoring unit, a judgment unit, and a power control unit. The generation unit generates an image file of an OS. The transmitting unit transmits the image file to to-be-tested servers. The testing unit injects a simulated fault into a server among the servers to which the image file is transmitted and performs a test. The restoring unit, each time the testing unit performs a test, restores a status of the to-be-tested server to a pre-failover status. The judgment unit judges whether the restoring unit properly restores the status. The power control unit, when the judgment unit judges that the status of the to-be-tested server is not properly restored, turns off power of the to-be-tested server and turns on the power again. | 10-02-2014 |
20140298083 | METHOD FOR SIP PROXY FAILOVER - For SIP proxy failover in a SIP telecommunication network (SIPN) comprising a plurality of proxies (P | 10-02-2014 |
20140317437 | AUTOMATIC CLUSTER-BASED FAILOVER HANDLING - Example embodiments relate to automatic cluster-based failover handling. In example embodiments, a first node included in a high availability cluster may receive a failure signal from a second node included in the high availability cluster, where the failure signal indicates a failure in the second node. The failure signal may also indicate a second port in the second node that is or will be inactive due to the failure, the second port being associated with a second port address. The first node may activate a first port of the first node, where the first port was previously inactive before being activated. The first node may assign a first port address to the first port, wherein the first port address is the same as the second port address. | 10-23-2014 |
20140317438 | System, software, and method for storing and processing information - A system for storing and processing information comprises a plurality of nodes, each node comprising: a local information storage medium; a data connection configured to connect to at least one linked client; and a processor configured to process information in the local information storage medium and send processed information to the at least one linked client, and a secondary shared storage medium connected to the plurality of nodes via a shared data connection and configured to store information copied from the local information storage medium of each of the plurality of nodes, wherein each of the nodes in the plurality of nodes is configured, in the event of failure of a failed one of the plurality of nodes, to connect to the at least one linked client corresponding to the failed one of the plurality of nodes. | 10-23-2014 |
20140317439 | PROCESS FOR SELECTING AN AUTHORITATIVE NAME SERVER - Methods and systems for intelligently choosing an authoritative name server from among a group of name servers for resolving Domain Name System requests. Systems and methods are provided that enable choosing of a first server associated with and/or operated by a first service provider based on a first measurement associated with that first server. The systems and methods further comprise requesting first data from that first server, determining that the first server is unresponsive, and choosing a second server. The second server is chosen based on a second measurement, and chosen contingent on it being associated with and/or operated by to a different service provider than that associated with the first server. The systems and methods then comprise requesting second data from the second server. | 10-23-2014 |
20140317440 | Method and Apparatus for Indirectly Assessing a Status of an Active Entity - A method and system permit a backup entity of a redundant apparatus of a communication system that shares control of hardware resources or other network resources with an active entity to indirectly determine a status of the active entity based upon behavior and reaction to actions it takes in connection with resources it shares control of with the active entity. Such a method and system permit the backup entity to deduce the state of the active entity without having any a hardware connection or other communication connection with the active entity. | 10-23-2014 |
20140325256 | SYSTEMS AND METHODS FOR MANAGING DISASTER RECOVERY IN A STORAGE SYSTEM - Systems and methods for providing for efficient switchover for a client in a storage network between the use of one or more a primary storage resources to one or more disaster recovery (DR) resources are provided herein. Embodiments may implement synchronization between such resources on a data plane and a control plane to allow for a transition between resources to be implemented in a manner that is minimally disruptive to a client. Moreover, embodiments may provide for processing resources which allow for switching a client between a primary storage resource to a secondary storage resource with minimal administrative interaction. | 10-30-2014 |
20140325257 | WSAN SIMULTANEOUS FAILURES RECOVERY METHOD - The WSAN simultaneous failures recovery method ranks each node based on the number of hops to a pre-designated root node in the network. The method identifies some nodes as cluster heads based on the number of their children in the recovery tree. The method assigns a recovery weight and a nearby cluster node to each node. Nearby cluster nodes serve as gateways to other nodes that belong to that cluster. The recovery weight is used to decide which node is better to move in order to achieve lower recovery cost. The recovery method uses the same on-going set of actors to restore connectivity. Simulation results have demonstrated that the recovery method can achieve low recovery cost per failed node in small and large networks. The results have also shown that clustering leads to lower recovery cost if the sub-network needs to re-establish links with the rest of the network. | 10-30-2014 |
20140325258 | COMMUNICATION FAILOVER IN A DISTRIBUTED NETWORK - An initial request is received to establish a communication session. The initial request contains a communication address of a first communication device. A communication server detects that the communication session cannot be established across a primary network. In response, the initial request is repurposed by changing the first communication address to a second communication address. The changed request is sent to a communication system, which adds a field to the changed request that indicates that the changed request is to be sent via a secondary network. The changed request is sent with the field to the communication server. The changed request with the second communication address is sent to a gateway to establish the communication session across a secondary network. A portion of the communication session is established using the second communication address. The first communication address is sent in the portion of the communication session using Dual-Tone-Multi-Frequency (DTMF). | 10-30-2014 |
20140331078 | Elastic Space-Based Architecture application system for a cloud computing environment - A software architecture and infrastructure to seamlessly scale mission-critical, high performance, stateful enterprise applications on any cloud environment (public as well as private). The described invention will allow converting an application to a scalable application and will provide a method and a system to efficiently scale up the performance of such an application based on space-based architecture. | 11-06-2014 |
20140331079 | Disable Restart Setting for AMF Configuration Components - A method and a system are provided for determining an AMF configuration of a highly available system with respect to whether to failover or restart a component when the component fails. The AMF configuration specifies at least two service-units containing components that represent resources, and a set of service-instances representing workload incurred by provision of services using the resources. The method identifies a failover duration and a restart duration for each component in a service-unit; and determines a failover outage and a restart outage for each service-instance impacted by a failure of a given component, based on the failover duration and the restart duration of each component in the service-unit. The method further determines whether to failover or to restart the given component if the given component fails, based on the failover outage and the restart outage of each service-instance impacted by the failure of the given component. | 11-06-2014 |
20140331080 | WARM STANDBY APPLIANCE - A warm standby appliance is described herein. The warm standby appliance is coupled to a storage server which is coupled to one or more servers. When a server fails, the storage server transfers a backed up image to the warm standby appliance, so that the warm standby appliance is able to replicate the failed server. While the failed server is inaccessible, the warm standby appliance is able to mimic the functionality of the failed server. When a new server or repaired server is available, the warm standby appliance is no longer needed. To incorporate the new server into the system quickly and easily, the server image of the warm standby appliance is sent to the new server. After transferring the image, the warm standby appliance is cleaned and returns back to a dormant state, waiting to be utilized again. | 11-06-2014 |
20140331081 | SCALABLE TRANSCODING FOR STREAMING AUDIO - Systems and techniques for capturing audio and delivering the audio in digital streaming media formats are disclosed. Several aspects of the systems and techniques operate in a cloud computing environment where computational power is allocated, utilized, and paid for entirely on demand. The systems and techniques enable a call to be made directly from a virtual machine out to a Public Switch Telephone Network (PSTN) via a common Session Interface Protocol (SIP) to PSTN Breakout service, and the audio to be delivered onward to one or more Content Delivery Network (CDN). An audio call capture interface is also provided to initiate and manage the digital streaming media formats. | 11-06-2014 |
20140337662 | USE OF AUXILIARY DATA PROTECTION SOFTWARE IN FAILOVER OPERATIONS - According to certain aspects, an information management cell can include at least one secondary storage computing device configured to conduct primary data generated by at least one client computing device to a secondary storage device(s) as part of secondary copy operations, wherein the secondary storage computing device normally operates to conduct primary data to the secondary storage device(s) for storage as a secondary copy in a first secondary copy file format, at the direction of a main storage manager; and can include a failover storage manager configured to activate in response to loss of connectivity between the cell and the main storage manager, and instruct a secondary copy application to perform a secondary copy operation in which the primary data generated by the at least one client computing device is stored as a secondary copy in a second secondary copy file format different than the first secondary copy file format. | 11-13-2014 |
20140351623 | System and Method for Virtualized Shared Use Environment with Dynamic IP Address Injection - Virtualized Shared Use Environment—database driven selector of specific a) company software, b) equipment location, and c) peripherals available. The selected elements together create a Virtualized Hypercart (VH) which carries all the variable elements necessary to select the appropriate running virtual machine. Derived from the above information, the system control software (SCS) selects from a table of available virtual machines that match the requirements of company software, equipment location, peripherals available and from the company software selection, the IP addressing schemes required and available for use by the virtual machine. Dynamic IP Address Injection—based on the VH, the IP addressing scheme is injected into the virtual machine that contains the appropriate company software for the type of location that made the request with the peripherals available for use by the virtual machine. | 11-27-2014 |
20140359340 | SUBSCRIPTIONS THAT INDICATE THE PRESENCE OF APPLICATION SERVERS - Systems and methods for indicating the presence of application servers. The system comprises a serving network element of an IP Multimedia Subsystem (IMS) network configured to communicate with a primary application server to provide a service for a session, to determine that the primary application server is unavailable, to failover to a secondary application server to provide the service, and to initiate a subscribe request to a presence server requesting to be notified when the primary application server becomes available. The serving network element is further configured to receive a notification from the presence server that the primary application server has become available, and to switch back to the primary application server to provide the service after receiving the notification. | 12-04-2014 |
20140359341 | LOCALITY BASED QUORUMS - Disclosed are various embodiments for distributing data items within a plurality of nodes. A data item that is subject to a data item update request is updated from a master node to a plurality of slave notes. The update of the data item is determined to be locality-based durable based at least in part on acknowledgements received from the slave nodes. Upon detection that the master node has failed, a new master candidate is determined via an election among the plurality of slave nodes. | 12-04-2014 |
20140359342 | SYSTEM AND METHOD FOR REDUNDANT OBJECT STORAGE - Systems and methods for redundant object storage are disclosed. A method may include storing at least two copies of each of a plurality of objects among a plurality of nodes communicatively coupled to one another in order to provide redundancy of each of the plurality of objects in the event of a fault of one of the plurality of nodes. The method may also include monitoring access to each object to determine a frequency of access for each object. The method may additionally include redistributing one or more of the copies of the objects such that at least one particular node of the plurality of nodes includes copies of only objects accessed at a frequency below a predetermined frequency threshold based on the determined frequency of access for each object. The method may further include placing the at least one particular node in a reduced-power mode. | 12-04-2014 |
20140365811 | CENTRALIZED VERSION CONTROL SYSTEM HAVING HIGH AVAILABILITY - A Version Control System (VCS) and methods having high availability, and combining the advantages of a centralized VCS while overcoming the limitations of centralized VCSs in a cluster environment. The system and method copes with failures of components in a cluster environment gracefully to guarantee uptime. The VCS and methods support high availability in a centralized VCS utilizing a plurality of repositories having a suitable architecture. In particular embodiments the architecture utilizes one or more of: Active-Passive repository replication; Active-Passive repository replication with automatic recovery; Active-Active repository replication; and hybrid model (Active-Active and Passive repository replication). | 12-11-2014 |
20140365812 | FAILOVER MECHANISM - Some embodiments of the invention provide a failover capability in a computer system that employs multiple paths to transfer information to and from a network, such as a computer system that performs virtualization, without introducing a new driver component to provide this capability. For example, some embodiments of the invention provide a networking virtual switch client capable of direct communication between a networking stack implemented by a virtual machine operating system and components comprising either a direct path or a synthetic path to a network interface controller coupled to a network. The networking virtual switch client may be capable of determining which of the paths to employ for a given communication, such as by determining that a synthetic path should be employed if a direct path is not available. | 12-11-2014 |
20140372790 | SYSTEM AND METHOD FOR ASSIGNING MEMORY AVAILABLE FOR HIGH AVAILABILITY FAILOVER TO VIRTUAL MACHINES - Techniques for assigning memory available for high availability (HA) failover to virtual machines in a high availability (HA) cluster are described. In one embodiment, the memory available for HA failover is determined in at least one failover host computing system of the HA cluster. Further, the memory available for HA failover is assigned to one or more virtual machines in the HA cluster as input/output (I/O) cache memory. | 12-18-2014 |
20140380087 | Fault Tolerance Solution for Stateful Applications - A fault tolerance method and system for VMs on a cluster identifies a client state for each client session for those applications. The method replicates the client session onto a primary and a backup VM, and uses a network controller and orchestrator to direct network traffic to the primary VM and to periodically replicate the state onto the backup VM. In case of a VM failure, the method reroutes network traffic of states for which the failed VM serves as a primary to the corresponding backup, and replicates states without a backup after the failure onto another VM to create new backups. The method may be used as part of a method or system implementing the split/merge paradigm. | 12-25-2014 |
20150019900 | TOLERATING FAILURES USING CONCURRENCY IN A CLUSTER - A system, and computer program product for tolerating failures using concurrency in a cluster are provided in the illustrative embodiments. A failure is detected in a first computing node serving an application in a cluster. A subset of actions is selected from a set of actions, the set of actions configured to transfer the serving of the application from the first computing node to a second computing node in the cluster. A waiting period is set for the first computing node. The first computing node is allowed to continue serving the application during the waiting period. During the waiting period, concurrently with the first computing node serving the application, the subset of actions is performed at the second computing node. Responsive to receiving a signal of activity from the first computing node during the waiting period, the concurrent operation of the second computing node is aborted. | 01-15-2015 |
20150019901 | TOLERATING FAILURES USING CONCURRENCY IN A CLUSTER - A method is provided in the illustrative embodiments. A failure is detected in a first computing node serving an application in a cluster. A subset of actions is selected from a set of actions, the set of actions configured to transfer the serving of the application from the first computing node to a second computing node in the cluster. A waiting period is set for the first computing node. The first computing node is allowed to continue serving the application during the waiting period. During the waiting period, concurrently with the first computing node serving the application, the subset of actions is performed at the second computing node. Responsive to receiving a signal of activity from the first computing node during the waiting period, the concurrent operation of the second computing node is aborted. | 01-15-2015 |
20150019902 | OpenFlow Controller Master-slave Initialization Protocol - A method for network controller initialization that includes identifying a controller connected to a network as a primary controller that manages switches in the network. One or more other controllers connected to the network are identified as secondary controllers. A failover priority table is created. The failover table indicates an order that the one or more other controllers will replace the controller as the primary controller in the event that the controller enters a failure mode. The failover priority table is broadcast to the switches in the network. | 01-15-2015 |
20150026508 | MOVING OBJECTS IN A PRIMARY COMPUTER BASED ON MEMORY ERRORS IN A SECONDARY COMPUTER - In an embodiment, a partition is executed at a primary server, wherein the partition accesses a first memory location at a first memory block address at the primary server. If a first corresponding memory location at a secondary server has an error, wherein the first corresponding memory location at the secondary server corresponds to the first memory location at the primary server, then an object is moved from the first memory location at the primary server to a second memory location at the primary server. | 01-22-2015 |
20150039930 | MULTI-TENANT DISASTER RECOVERY MANAGEMENT SYSTEM AND METHOD FOR INTELLIGENTLY AND OPTIMALLY ALLOCATING COMPUTING RESOURCES BETWEEN MULTIPLE SUBSCRIBERS - A Multi-Tenant Disaster Recovery Management System and method for intelligently and optimally allocating computing resources between multiple subscribers, the system comprising: one or more Multi-Tenant Disaster Recovery Management Server logically connected to one or more Production Site and one or more cloud based Disaster Recovery Site; a Network connecting said Multi-Tenant Disaster Recovery Management Server with said Production Site and said cloud based Disaster Recovery Site, wherein said Multi-Tenant Disaster Recovery Management Server is provided with at least one Disaster Recovery (DR) Manager Module, at least one Drill Scheduler Module, at least one Drill Executor Module, at least one WS Interface Module, at least one Usage Monitor Module and at least one Report Manager Module. | 02-05-2015 |
20150039931 | REPLAYING JOBS AT A SECONDARY LOCATION OF A SERVICE - Jobs submitted to a primary location of a service within a period of time before and/or after a fail-over event are determined and are resubmitted to a secondary location of the service. For example, jobs that are submitted fifteen minutes before the fail-over event and jobs that are submitted to the primary network before the fail-over to the second location is completed are resubmitted at the secondary location. After the fail-over event occurs, the jobs are updated with the secondary network that is taking the place of the primary location of the service. A mapping of job input parameters (e.g. identifiers and/or secrets) from the primary location to the secondary location are used by the jobs when they are resubmitted to the secondary location. Each job determines what changes are to be made to the job request based on the job being resubmitted. | 02-05-2015 |
20150046744 | SYSTEM AND METHOD FOR PROCESSING WEB SERVICE TRANSACTIONS USING TIMESTAMP DATA - A system is provided that is adapted to service web-based service requests. In one implementation, a caching service is provided for storing and servicing web service requests. In one implementation, virtual computer systems may be used to service requests in a more reliable manner. Different operating modes may be configured for backup redundancy and the caching service may be scaled to meet service requests for a particular application. Also, methods are provided for exchanging timestamp information among web service transaction systems to reduce the amount of processing capability and bandwidth for ensuring database consistency. | 02-12-2015 |
20150046745 | METHOD AND SYSTEM FOR TRANSPARENTLY REPLACING NODES OF A CLUSTERED STORAGE SYSTEM - Method and system for replacing a first node and a second of a clustered storage system by a third node and a fourth node are provided. The method includes migrating all storage objects managed by the first node to the second node; replacing the first node by the third node and migrating all the storage objects managed by the first node and the second node to the third node; and replacing the second node by the fourth node and then migrating the storage objects previously managed by the second node but currently managed by the third node to the fourth node. The nodes may also be replaced by operationally connecting the third node and the fourth node to storage managed by the first node and the second node; joining the third node and the fourth node to a same cluster as the first node and the second node. | 02-12-2015 |
20150052382 | FAILOVER METHODS AND SYSTEMS FOR A VIRTUAL MACHINE ENVIRONMENT - A storage provider executing a plurality of Web servers is provided for receiving a request from a management console managing a plurality of virtual machines. The management console uses a same address to send the request, regardless of which Web server is selected to process the request. The selected Web server re-sends the request to a second storage provider node instance, when a first storage provider node instance fails to process the request, where the first and the second storage provide node instances are executed by the storage provider as virtual machines for providing failover in processing requests. | 02-19-2015 |
20150052383 | MANAGING DATABASE NODES - A method for managing database nodes includes determining that a data segment is on a failed node. The data segment is referenced by an operation of a query plan. The method includes selecting a victim node based on a segmentation ring, a buddy node for the data segment, a plurality of remaining operational nodes, and a predetermined selection parameter. The method includes generating a query plan such that the victim node performs double duty for operations accessing the data segment from a buddy projection on the victim node, and operations accessing a data segment for a primary projection of the victim node. | 02-19-2015 |
20150052384 | INFORMATION PROCESSING SYSTEM, CONTROL METHOD OF INFORMATION PROCESSING SYSTEM, AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM - The information processing system includes a first management device, a second management device coupled to the first management device, and a first information processing device coupled to the second management device, wherein the second management device receives, from the first information processing device, a notification indicating that an operation state of the first information processing device is changed from a first state to a second state, and the second management device transmits the notification to the first management device after a first time period is collapsed after receiving the notification, the first time period being defined based on the second state. | 02-19-2015 |
20150058659 | AUTOMATIC FAILOVER IN MODULAR CHASSIS SYSTEMS - Systems and methods for automatic failover in modular chassis systems. In some embodiments, a modular chassis includes a chassis management controller and a plurality of server blades. A first of the plurality of server blades may be configured to detect an internal fault to and to transmit a corresponding alert message to the chassis management controller via a midplane connection. Moreover, the chassis management controller may be configured to initiate a migration procedure to transfer one or more workloads from the first server blade to a second of the plurality of server blades. | 02-26-2015 |
20150067386 | INTEGRATION NETWORK DEVICE AND SERVICE INTEGRATION METHOD THEREOF - An integration network device and a service integration method thereof are provided. The integration network device receives a connecting request from the VDI user device. The integration network device establishes a connection between the VDI user device and the first management network device according to the connecting request. The integration network device determines that the first management network device fails the connection according to a first management information of the first management network device. The integration network device routes the VDI user device to the second management network device according to a second management information of the second management network device. | 03-05-2015 |
20150074447 | CLUSTER SYSTEM AND METHOD FOR PROVIDING SERVICE AVAILABILITY IN CLUSTER SYSTEM - Provided is a cluster system including a first physical server having registered therein an active virtual machine; and a plurality of physical servers, wherein the plurality of physical servers comprises a second physical server having registered therein a standby virtual machine corresponding to the active virtual machine, the active virtual machine failing over to the standby virtual machine when a failure occurs in the first physical server, wherein each of the plurality of physical servers stores post-failure registration information when the failure occurs, wherein the post-failure registration information associates the active virtual machine with a physical server among the plurality of physical servers, and wherein the physical server is different from the second physical server. | 03-12-2015 |
20150074448 | CLUSTER SYSTEM, INFORMATION PROCESSING DEVICE, METHOD OF CONTROLLING CLUSTER SYSTEM, AND RECORDING MEDIUM - The present invention provides a cluster system that promptly stops access to a shared disk upon occurrence of abnormality. The cluster system is a cluster system where an active system server and a standby system server operate utilizing a shared disk. Each server includes: a disk input/output unit that accesses the shared disk by using data that is input and output via a predetermined bus; a fault detecting unit that, when a fault occurs in the active system server, detects the fault; and a bus closing unit that, when the fault detecting unit detects the fault, closes the bus by issuing an uncorrectable fault generation request to cause generation of an uncorrectable fault on the bus. | 03-12-2015 |
20150074449 | Fault Detection And Correction For Single And Multiple Media Players Connected To Electronic Displays, And Related Devices, Methods And Systems - Systems, devices, software, hardware and networks adapted and arranged for monitoring and correcting faults in networked media player systems that include electronic displays are provided. After detection or notification of a fault in at least one networked media player in a network of at least two, or N, media players operationally connected to electronic displays, the invention provides an alternate source of signal to the affected display. In some preferred embodiments, the invention utilizes at least one additional, or N+1, media player as a backup to substitute for the failed media player. Reconfiguration of the faulted media player by means of the N+1 backup networked media player advantageously increases the reliability and efficiency of ongoing maintenance of digital visual systems operating in commercial and other environments. | 03-12-2015 |
20150082079 | Maximizing Use of Storage in a Data Replication Environment - Mechanisms for controlling access to storage volumes on the secondary storage system is provided. A determination is made as to whether a first site computing device has sent a notification of a failure condition of a first site. In response to a determination that the notification of the failure condition of the first site has not been received, secondary workloads of a second site computing device are permitted to access storage volumes on the secondary storage system. In response to a determination that the notification of the failure condition of the first site has been received, a mode of operation of the second site is modified from a normal mode of operation to a failure mode of operation. In the failure mode of operation, the storage system controller of the second site blocks at least a portion of access requests from secondary workloads of the second site computing device. | 03-19-2015 |
20150089274 | SYSTEM AND METHOD FOR SUPPORTING FAULT TOLERANT JOB MANAGEMENT IN A CLOUD PLATFORM ENVIRONMENT - In accordance with an embodiment, described herein is a system and method for supporting fault tolerant job management for use with a cloud computing environment. In accordance with an embodiment, the system comprises a job manager that manages the execution of jobs within the cloud environment including their job states, and a job manager service that provides an application program interface which receives administrative commands to be processed within the cloud environment as jobs. The job manager supports fault tolerant job processing including associating the jobs with checkpoints, recognizing a failover command for the jobs, and associating the jobs with work units of the administrative commands, and storing a state for each job upon processing each work unit of the command. | 03-26-2015 |
20150089275 | PREVENTING EXTREME CORESIDENCY HAZARDS IN CLOUD COMPUTING - Various exemplary embodiments relate to a method of preventing extreme coresidency hazards among application instances in a cloud network. The method includes determining a first failure group of a first instance of an application; establishing a connection with a second instance of a peer application; determining a second failure group of the second instance; comparing the first failure group to the second failure group; and establishing a second connection with a third instance of the peer application if the first failure group and the second failure group share a failure point. | 03-26-2015 |
20150095690 | Redundant Automation System - A redundant automation system having a plurality of automation devices which are connected to one another comprises a plurality of master devices and a slave device. Each of the plurality of automation devices processes a control program in order to control a technical process. At least one of the plurality of automation devices operates as a slave and at least two of the plurality of automation devices, each operates as a master. The plurality of master devices is each configured to run a respective master program and to process processing sections of the respective master program of the respective master program, and the slave device is configured to process a corresponding slave control program for each master control program run by the plurality of master devices and, if one of the plurality of master devices fails, to assume the function of the failed master. | 04-02-2015 |
20150095691 | Cloud-Based Virtual Machines and Offices - Cloud-based virtual machines and offices are provided herein. Methods may include establishing a cloud-based virtual office, by providing selections, corresponding to backups of servers of a computing network, to a user interface, establishing a cloud gateway for the virtual office, virtualizing a backup for each server using a virtualization program to create the cloud-based virtual office that includes virtual server machines networked with one another via the cloud gateway, and providing a workload to the cloud-based virtual office. | 04-02-2015 |
20150113312 | SYSTEM AND METHOD FOR DETECTING SERVER REMOVAL FROM A CLUSTER TO ENABLE FAST FAILOVER OF STORAGE - Aspects of the disclosure pertain to a system and method for detecting server removal from a cluster to enable fast failover of storage (e.g., logical volumes). A method of operation of a storage controller of a cluster is disclosed. The method includes receiving a signal. The method further includes, based upon the received signal, determining that communicative connection between a second storage controller of the cluster and the first storage controller of cluster is unable to be established. The method further includes determining whether communicative connection between the first storage controller and expanders of first and second enclosure services manager modules of the cluster is able to be established. The method further includes, when it is determined that communicative connection between the first storage controller and the expanders of the first and second enclosure services manager modules of the cluster is able to be established, performing a failover process. | 04-23-2015 |
20150113313 | METHOD OF OPERATING A SERVER SYSTEM WITH HIGH AVAILABILITY - An ARP table is stored in a gateway. The ARP table corresponds a first external IP address of a first application server to a first MAC address of a first external network card of the first application server, and corresponds a second external IP address of a second application server to a second MAC address of a second external network card of the second application server. The first and second application servers will check the statuses of one another to see if any of them fails. If the first application server fails, the first external IP address will be added to the second external network card, and the ARP table will be updated to correspond the first external IP address to the second MAC address. | 04-23-2015 |
20150113314 | METHOD AND SYSTEM OF IMPLEMENTING A DISTRIBUTED DATABASE WITH PERIPHERAL COMPONENT INTERCONNECT EXPRESS SWITCH - In one exemplary aspect, a method a Peripheral Component Interconnect Express (PCIe) based switch that provides a bridge between a set of database nodes of the distributed database system is provided. A failure in a database node is detected. A consensus algorithm is implemented to determine a replacement database node. A database index of a data storage device formally managed by the database node that failed is migrated to a replacement database node. The PCIe-based switch is remapped to attach the replacement database node with the database index to the data storage device. | 04-23-2015 |
20150113315 | SWITCH PROVIDED FAILOVER - A system is configured to: transmit requests to a first device and a second device; receive a first reply from the first device in response to one of the requests; determine an address of the first device based on the first reply; assign a first port to a first network when the first device is a first one of one or more devices that replied to the requests and have a same address as the first device; receive a second reply from the second device in response to another one of the requests; assign a second port to a second network when the address of the second device is the same as the address of the first device; and reassign the second port, from the second network, to the first network when a failure of the first device occurs. | 04-23-2015 |
20150113316 | METHOD AND SYSTEM AND APPARATUS FOR MASS NOTIFICATION AND INSTRUCTIONS TO COMPUTING DEVICES - Systems, methods, and devices for simultaneously distributing mass notifications to multiple users. A mass notification system receives input data and, based on this input data, creates notifications for mass distribution. The notifications are then transmitted to computing devices used by the users who are to be notified. | 04-23-2015 |
20150121122 | Visualizing Disaster Recovery Plan Execution for the Cloud - Embodiments visualize the execution of a disaster recovery plan. During a transfer of computing nodes from a source site to a target site, a map user interface (UI) is displayed. The map UI includes a first region corresponding to the geographic location of the source site and a second region corresponding to the geographic location of the target site. An animated progress indicator representing termination of computing nodes at the source site is displayed in the first region. Simultaneously, another animated progress indicator representing initiation of computing nodes at the target site, and further representing a reverse animation of the other animated progress indicator, is displayed in the second region. | 04-30-2015 |
20150121123 | FAILOVER FUNCTIONALITY FOR CLIENT-RELATED SECURITY ASSOCIATION - There are provided measures for a failover functionality for a client-related security association. Such measures exemplarily comprise providing a failover functionality at a proxy function and/or facilitating provision of a failover functionality at a servicing call state control function, wherein the respective failover functionality relates to a first proxy function, the serving function is for servicing the first proxy function and a second proxy function, the first proxy function has a security association with a client, and the first proxy function and the second proxy function are reachable with the same network address. | 04-30-2015 |
20150127970 | Selected Virtual Machine Replication and Virtual Machine Restart Techniques - Methods, systems, and articles of manufacture for selected VM replication and VM restart techniques are provided herein. A method includes selecting a sub-set of one or more VMs from a set of multiple VMs in a system to be replicated before an identification of one or more failed VMs in the set of multiple VMs; replicating the sub-set of one or more VMs before the identification of one or more failed VMs in the set of multiple VMs; selecting a sub-set of the identified one or more failed VMs to be restarted upon an identification of the one or more failed VMs in the set of multiple VMs in the system; and restarting the sub-set of the identified one or more failed VMs upon the identification of the one or more failed virtual machines in the set of multiple VMs. | 05-07-2015 |
20150135001 | PERSISTENT MESSAGING MECHANISM - A method comprising using at least one hardware processor for managing persistent messaging data in a volatile memory, writing the persistent messaging data to a first section of a Fast Persistent Memory (FPM), responsive to the first section of the FPM approaching a full state, offloading the persistent messaging data from the first section of the FPM to a hard disk device (HDD), and erasing the persistent messaging data from the first section of the FPM, recording, in a second section of the FPM, an identifier of said offloading, responsive to receiving a request to erase or modify at least some of the persistent messaging data in the HDD, updating the identifier of the offloading in the second section of the FPM while leaving the persistent messaging data in the HDD intact, and responsive to a server failure, selectively reading at least some of the persistent messaging data from the HDD to the volatile memory, wherein the selective reading is based on the identifier of the offloading in the second section of the FPM. | 05-14-2015 |
20150143158 | Failover In A Data Center That Includes A Multi-Density Server - Failover in a data center that includes a multi-density server, where the multi-density server includes multiple independent servers, includes; detecting, by a management module, a failure of one of the independent servers of the multi-density server; identifying, by the management module, a failover target; determining, by the management module, whether the failover target is a non-failed independent server included in the multi-density server; and responsive to determining that the failover target is a non-failed independent server included in the multi-density server, migrating, by the management module, the failed independent server's workload to another server that is not included in the multi-density server. | 05-21-2015 |
20150143159 | FAILOVER IN A DATA CENTER THAT INCLUDES A MULTI-DENSITY SERVER - Failover in a data center that includes a multi-density server, where the multi-density server includes multiple independent servers, includes; detecting, by a management module, a failure of one of the independent servers of the multi-density server; identifying, by the management module, a failover target; determining, by the management module, whether the failover target is a non-failed independent server included in the multi-density server; and responsive to determining that the failover target is a non-failed independent server included in the multi-density server, migrating, by the management module, the failed independent server's workload to another server that is not included in the multi-density server. | 05-21-2015 |
20150143160 | MODIFICATION OF A CLUSTER OF COMMUNICATION CONTROLLERS - Provided is a system having a cluster of communication controllers, a method for modification of the latter, and a computer program product carrying computer executable code for execution of the method. Each communication controller is operable for providing network connections of the computer system with external computer systems using communication protocols of a first type and a second type. The first type is a failover tolerant communication protocol type. The second type is a failover non-tolerant communication protocol type. All network connections of each communication controller are disconnected during the modification of said communication controller. | 05-21-2015 |
20150143161 | DISASTER RECOVERY APPLIANCE - A disaster recovery appliance is described herein. The disaster recovery appliance is coupled to one or more servers. The disaster recovery appliance continuously receives backup data for each of the one or more servers. When a server fails, the disaster recovery appliance, replaces the failed server. While the failed server is inaccessible, the disaster recovery appliance is able to mimic the functionality of the failed server. In some embodiments, the disaster recovery appliance is able to act as a server in addition to a backup device for the other servers. | 05-21-2015 |
20150143162 | Two-Tier Failover Service for Data Disaster Recovery - Technologies are described herein for providing a two-tier failover service. A request to access content by an application associated with an application identifier may be identified. A first record corresponding to the application identifier may be retrieved from a database information table. The first record may include a reference identifier, a database name of a database, and a failover value. A second record corresponding to the reference identifier may be retrieved from a server information table. The second record may include an indication of a first server computer as a primary server computer and an indication of a second server computer as a secondary server computer. A connection specification to either the first server computer or the second server computer may be generated based on the first record and the second record. | 05-21-2015 |
20150149813 | FAILURE RECOVERY SYSTEM AND METHOD OF CREATING THE FAILURE RECOVERY SYSTEM - When detecting the configuration change or the operating state of a virtual machine of the main system, a VM management unit changes a value of a determination index of the virtual machine, and selects a virtual machine of the standby system/auxiliary system used for failure recovery of the virtual machine of the main system on the basis of a value of the determination index. A pattern generation unit provides the virtual machine of the standby system/auxiliary system selected by the VM management unit. | 05-28-2015 |
20150149814 | FAILURE RECOVERY RESOLUTION IN TRANSPLANTING HIGH PERFORMANCE DATA INTENSIVE ALGORITHMS FROM CLUSTER TO CLOUD - A method of providing failure recovery capabilities to a cloud environment for scientific HPC applications. An HPC application with MPI implementation extends the class of MPI programs to embed the HPC application with various degrees of fault tolerance. An MPI fault tolerance mechanism realizes a recover-and-continue solution. If an error occurs, only failed processes re-spawn, the remaining living processes remain in their original processors/nodes, and system recovery costs are thus minimized. | 05-28-2015 |
20150293821 | HEALING CLOUD SERVICES DURING UPGRADES - Embodiments described herein are directed to migrating affected services away from a faulted cloud node and to handling faults during an upgrade. In one scenario, a computer system determines that virtual machines running on a first cloud node are in a faulted state. The computer system determines which cloud resources on the first cloud node were allocated to the faulted virtual machine, allocates the determined cloud resources of the first cloud node to a second, different cloud node and re-instantiates the faulted virtual machine on the second, different cloud node using the allocated cloud resources. | 10-15-2015 |
20150293823 | High Availability Method and System for Improving the Utility of Physical Servers in Cloud Computing Resource Pool - A high availability method and system for improving utilization of physical servers in a cloud computing resource pool, wherein the method includes: when the physical servers in the cloud computing resource pool fail, judging whether idle memory on the normally-running physical servers can support running of virtual machines on all the failed physical servers; when the idle memory on the normally-running physical servers can support the running of the virtual machines on all the failed physical servers, restarting the virtual machines on all the failed physical servers on the normally-running physical servers. The embodiments of the present document improve the utilization of memory resource of physical servers. | 10-15-2015 |
20150301909 | SYSTEMS AND METHODS FOR PREVENTING INPUT/OUTPUT PERFORMANCE DECREASE AFTER DISK FAILURE IN A DISTRIBUTED FILE SYSTEM - In accordance with embodiments of the present disclosure, a method may include receiving from a plurality of data nodes of a distributed file system an indication of whether a fault condition exists with respect to a storage resource of the respective data node. The method may also include receiving an input/output request for a storage resource of a particular data node from a host information handling system communicatively coupled to the distributed file system. The method may further include, responsive to the input/output request, directing the input/output request to the particular data node if no fault condition exists with respect to storage resources of the particular data node and directing the input/output request to another data node of the distributed file system if a fault condition exists with respect to one or more storage resources of the particular data node. | 10-22-2015 |
20150301910 | FLEXIBLE HIGH AVILABILITY DISASTER RECOVERY WITH A SET OF DATABASE SERVERS - Seamless failover in a database replication environment which has a primary database server and a plurality of standby database servers, is described. An example method includes orderly terminating transactions on the primary database server, where the transactions are originated from client applications. The transaction logs of the primary database server are drained and the transaction logs are replicated from the primary database server to the plurality of standby database servers. One of the standby database servers is designated as a new primary database server processing user transactions. | 10-22-2015 |
20150301912 | ACTIVE HOT STANDBY REDUNDANCY FOR BROADBAND WIRELESS NETWORK - In embodiments of the present disclosure systems and methods implementing active-hot standby redundancy in server architectures are described. In an active-hot standby redundancy architecture, two matching service instances are installed in a network on different host computers. A standby service instance may maintain state information for every session maintained at an active service instance that it is poised to replace using a publish-subscribe communications network. When a failure occurs in the active instance, the standby instance may promote itself to active, and assume all aspects of the service identity and role of the Active instance it is replacing. Service to user entities continues without interruption, although transactions that are ongoing just as the failure occurs may be lost. | 10-22-2015 |
20150304235 | ALLOCATING AND ACCESSING WEBSITE RESOURCES VIA DOMAIN NAME ROUTING RULES - Systems and methods are provided for allocating and accessing website resources via domain name routing rules as opposed to the domain name system (DNS). The system may include a reverse proxy server that includes domain name routing rules and a plurality of hosting servers. The reverse proxy server may receive a request from a client, wherein the request may comprise a domain name and possibly a path. The reverse proxy server may fulfill the request using the domain name routing rules for the domain name and possibly the path. The request may be, as non-limiting examples, to transfer a file to a requester, move a file from one server to another server, allocate redundant passive resources that may be activated in the event of an error, provide a website resource that may span two or more hosting servers and/or retrieve data from cache on the reverse proxy server. | 10-22-2015 |
20150309890 | EMULATING A STRETCHED STORAGE DEVICE USING A SHARED REPLICATED STORAGE DEVICE - Exemplary methods, apparatuses, and systems include receiving a command from a recovery manager running on a management server within a first or second datacenter. In response to the command, device identifiers for one or more logical storage devices within the first datacenter are requested. In response to the request, a first device identifier for a first logical storage device within the first datacenter and a peer device identifier for a second logical storage device within the second datacenter are received. Data is replicated from the first logical storage device to the second logical storage device. The first and second logical storage devices are in an active-passive configuration, the first logical storage device storing the replicated data being active and the second logical storage device storing the replicated data being passive. The command with the peer device identifier is sent to the underlying storage. | 10-29-2015 |
20150309891 | FAILURE RECOVERY METHOD IN INFORMATION PROCESSING SYSTEM AND INFORMATION PROCESSING SYSTEM - Services are promptly resumed at the time of a failure recovery in an information processing system. Before a first server system | 10-29-2015 |
20150309893 | METHOD OF RECOVERING APPLICATION DATA - A method of recovering application data from the memory of a failed node in a computer system comprising a plurality of nodes connected by an interconnect and of writing the application data to a replacement node; wherein a node of the computer system executes an application which creates application data storing the most recent state of the application in a node memory; the node fails; the node memory of the failed node is then controlled using a failover memory controller; and the failover memory controller copies the application data from the node memory of the failed node to a node memory of the replacement node over the interconnect. | 10-29-2015 |
20150309894 | Fast Failover for Application Performance Based WAN Path Optimization with Multiple Border Routers - According to one aspect, a control entity (such as a policy server) in communication with a plurality of border routers in a network, generates failover entries for one or more traffic flows. Each failover entry specifies a backup path to be used by a border router when the border router determines that a wide area network interface of the border router has failed. The control entity sends the failover entries to each of the border routers. A border router operating in a network stores failover entries for one or more traffic flows. For packets received at the border router either from a local area network interface or via a tunnel from another border router, when the border router detects that the wide area network interface has failed, the border router determines how to handle the packets based on the stored failover entries. | 10-29-2015 |
20150309895 | DECISION-MAKING SUPPORT SYSTEM AND DECISION-MAKING SUPPORT METHOD - A decision-making support device configured to store a predetermined criterion defining a damage degree according to a content of each of disasters, and predetermined information symbolizing various human activities in disasters, and configured to determine a damage degree for each of areas at a disaster-stricken area by comparing disaster information acquired from a predetermined interface with a predetermined criterion, determine an activity degree of human activities for each of the areas based on appearance frequency of predetermined information in various types of information acquired from a predetermined interface or disaster information, identify, as a support-needed area, an area with a higher damage degree and a lower activity degree than those of other areas, or an area with the damage degree and the activity degree higher by a predetermined level or more than those of other areas, and output information of the support-needed area to a predetermined device. | 10-29-2015 |
20150309896 | Method, System, and Apparatus for Cloud Application Redundancy - A redundancy method, system, and apparatus, which can acquire first description information of a cloud application needing redundancy, where the first description information includes information about a source virtual machine and a source network which are used at a production site by the cloud application needing redundancy; and can generate second description information of the cloud application needing redundancy at a redundancy site based on the first description information that gives an overall description about the cloud application needing redundancy, where the second description information gives an overall description about the deployment of the cloud application needing redundancy at the redundancy site; and the redundancy site is capable of acquiring the second description information, to recover the cloud application needing redundancy at the redundancy site, thereby implementing redundancy based on a cloud application. | 10-29-2015 |
20150317217 | FAILURE RECOVERY SCHEME FOR A CLOUD SYSTEM - Technologies are generally described for a failure recovery scheme for a cloud system. In some examples, the cloud system may include one or more computing nodes, and one or more network switches configured to relay one or more packets among the one or more computing nodes. A respective one of the computing nodes may include a first processor configured to process the one or more packets to communicate with at least one of the network switches, and a second processor configured to process the one or more packets to communicate with at least one of the other computing nodes in the cloud system. | 11-05-2015 |
20150317218 | MIXED MODE SESSION MANAGEMENT - Systems and methods for managing multiple versions of applications executing on servers in a server pool are provided. A first server executing a first version of an application loads session data associated with a second, different version of the application. An error is detected based on the difference between the first version and the second version. A second server executing the second version of the application is selected by the first server in a server pool comprising one or more servers. The first server transmits a hypertext transfer protocol proxy request to the selected second server, which successfully processes the session data and handles the request without error. | 11-05-2015 |
20150317220 | DYNAMIC GENERATION OF DISASTER RECOVERY PLAN WHICH REACT TO CHANGES TO AN UNDERLYING TOPOLOGY - Techniques are described for dynamically generating a disaster recovery plan. In an embodiment, a set of topology metadata is determined for a first site on which a multi-tier application is deployed and a second site where the multi-tier application will be activated in the event of switchover/failover. The topology metadata may include metadata that identifies a set of targets associated with a plurality of tiers on the first site on which the multi-tier application is deployed and also on the second site where the multi-tier application would be activated in the event of disaster recovery operation like switchover or failover. Based, at least in part, on the topology metadata for the first site and second site, a disaster recovery plan is generated. The disaster recovery plan includes an ordered set of instructions for deactivating the multi-tier application at the first site and activating the multi-tier application at a second site. | 11-05-2015 |
20150317221 | COMPREHENSIVE ERROR MANAGEMENT CAPABILITIES FOR DISASTER RECOVERY OPERATIONS - Techniques are described for providing error management capabilities for disaster recovery operations. In an embodiment, first user input is received that identifies a first error mode to assign to a particular step of a disaster recovery plan that includes a set of steps for performing a disaster recovery operation. In response to receiving the first user input, the particular step is associated with the first error mode. In response to determining that an error occurred while processing the particular step of the disaster recovery plan, the error mode that is associated with the particular step is determined. Error handling is performed for the particular step based, at least in part, on the error mode that is associated with the particular step of the disaster recovery plan. | 11-05-2015 |
20150317222 | PRESERVING MANAGEMENT SERVICES WITH SELF-CONTAINED METADATA THROUGH THE DISASTER RECOVERY LIFE CYCLE - During normal operation, at a first site, of a disaster recovery management unit, at least one customer workload machine, at least one management service machine, and metadata for the at least one management service machine are replicated to a remote disaster recovery site. After a disaster at the first site, a replicated version of the at least one customer workload machine and a replicated version of the at least one management service machine are brought up at the remote disaster recovery site. A replicated version of the metadata for the at least one management service machine is reconfigured by executing, on the replicated version of the at least one management service machine, a failover script, to obtain reconfigured replicated metadata for the replicated version of the at least one management service machine. When the first site comes back up, failback is carried out, essentially in the reverse order. | 11-05-2015 |
20150317223 | METHOD AND SYSTEM FOR HANDLING FAILURES BY TRACKING STATUS OF SWITCHOVER OR SWITCHBACK - Techniques for recovering from a failure at a disaster recovery site are disclosed. An example method includes receiving an indication to shift control of a set of volumes of a plurality of volumes. The set of volumes is originally owned by a second storage node. The first storage node is a disaster recovery partner of the second storage node. The method includes shifting control of the set of volumes. The method further includes during the shifting, changing a status of a flag corresponding to a progress of the shifting. The method also includes during a reboot of the first storage node, determining the status of the flag and determining, based on the status of the flag, whether to mount the set of volumes during reboot at the first storage node. | 11-05-2015 |
20150317226 | DETECTING DATA LOSS DURING SITE SWITCHOVER - Techniques for detecting data loss during site switchover are disclosed. An example method includes storing at NVRAM of a first node a plurality of operations of a second node, the first and second nodes being disaster recovery partners. The method also includes during a switchover from the second node to the first node, receiving an indication of a first number of operations yet to be completed. The method further includes comparing the first number to a second number of operations in the plurality of operations stored at the NVRAM of the first node. The method also includes in response to the comparing, determining whether at least one operation is missing from the plurality of operations stored in the NVRAM of the first node. The method further includes in response to determining that at least one operation is missing, failing at least one volume. | 11-05-2015 |
20150324259 | METHOD AND SYSTEM FOR AUTOMATIC FAILOVER FOR CLIENTS ACCESSING A RESOURCE THROUGH A SERVER USING HYBRID CHECKSUM LOCATION - Some embodiments are directed to a method and apparatus for implementing an automatic failover mechanism for a resource. A client accesses a source through a first server using a first session. During the session, the client stores checksum information corresponding to data received via the session with the first server. When it is detected that the session between the first server and the client has failed, the client is automatically connected with second server that has access to the resource. The checksum information is transmitted from the client to the second server, where it is compared with checksum information calculated at the second server, so that a determination can be made as to whether the client can continue processing where it left off when connected to the second server. | 11-12-2015 |
20150331762 | ACTIVE HOST AND BACKUP HOST IN A HOST ARRANGEMENT FOR COMMUNICATING WITH A TERMINAL CONNECTED TO AN IP NETWORK - A method in a host arrangement for communicating with a terminal connected to an IP communication network. The arrangement comprises at least two hosts, one operating as active host and the remaining at least one host operating as backup host(s). The arrangement is connected to the IP communication network by means of a switch, wherein each host of the arrangement is connected to the switch by means of an individual link, the active host being associated with an IP and a MAC address. The method comprises detecting a link failure between the active host and the switch, or a malfunction of the active host, and determining a backup host to take over. The method comprises associating the IP and the MAC address of the active host to the determined backup host to take over, and triggering a MAC learning process in the switch. | 11-19-2015 |
20150331766 | Deferred Replication of Recovery Information At Site Switchover - Methods, systems, and computer program products for providing deferred replication of recovery information at site switchover are disclosed. A computer-implemented method may include receiving a first copy of logged data for storage volumes of a disaster recovery (DR) partner at a remote site from the DR partner, receiving a request to perform a site switchover from the remote site to the local site, receiving a second copy of logged data for the storage volumes from a local high availability (HA) partner in response to the switchover, and recovering the storage volumes locally by applying one or more of the copies of logged data to corresponding mirrored storage volumes at the local site. | 11-19-2015 |
20150334086 | METHOD AND DEVICE FOR ACCESSING APPLICATION SERVER - Provided is a method for accessing an application server. The method includes: obtaining an IP address or server domain name for accessing an application server from an access point list; initiating an access to the application server using the IP address or server domain name; and after the access to the application server succeeds, updating the access point list by storing an IP address delivered by the application server according to a load balancing policy to a blank entry of the access point list. By performing load balancing at an application server side according to the characteristics of an application program, the method improves the success rate of access by a user. | 11-19-2015 |
20150339200 | INTELLIGENT DISASTER RECOVERY - One embodiment of the invention includes a system for performing intelligent disaster recovery. The system includes a processor and a memory. The memory stores a first monitor application that, when executed on the processor, performs an operation. The operation includes communicating with a second monitor application hosted at a secondary data center to determine an availability of one or more computer servers at a primary data center. The operation also includes upon reaching a consensus with the second monitor application that one or more computer servers at the primary data center are unavailable to process client requests, relative to both the first monitor application and the second monitor application, initiating a failover operation. Embodiments of the invention also include a method and a computer-readable medium for performing intelligent disaster recovery. | 11-26-2015 |
20150347248 | COMMUNICATION CONTINUATION DURING CONTENT NODE FAILOVER - Described herein are methods, systems, and software for accommodating failover of a content node in a content delivery network. In one example, a method of operating a control node includes receiving content requests issued by end user devices. The method further provides, for at least a first content request, mapping a first connection between a first end user device and a first content node, the first connection defined by at least a network address of the first end user device and a virtual next hop network address, and directing traffic associated with the first connection to the first content node using at least the virtual next hop network address. The method also includes identifying a service interruption associated with the first content node and, responsive to the service interruption, identifying a second content node to handle the communications for the first connection. | 12-03-2015 |
20150347249 | FAILOVER HANDLING IN A CONTENT NODE OF A CONTENT DELIVERY NETWORK - Described herein are methods, systems, and software for accommodating failover of a content node in a content delivery network. In one example, a method of operating a content node includes receiving a communication for an end user device from a control node, wherein an interrupted content node previously handled the communication. The method further includes determining if the communication includes a synchronization packet and identifying connection information for the communication. The method also provides, if the communication includes a synchronization packet, accepting the communication and handling delivery for the end user device. The method also includes, if the communication does not include the synchronization packet, determining if a match exists between the connection information for the communication and connection information stored in a flow table, and handling the communication based on the match. | 12-03-2015 |
20150347250 | DATABASE MANAGEMENT SYSTEM FOR PROVIDING PARTIAL RE-SYNCHRONIZATION AND PARTIAL RE-SYNCHRONIZATION METHOD OF USING THE SAME - Provided is a database management system (DBMS). The DBMS synchronizes an active node with a standby node and detects a point of time when last synchronization is performed between the active node and the standby node. Then, after the DBMS performs page synchronization from the detected point of time of the last synchronization to a point at which a failover occurs in the active node, the DBMS performs partial log synchronization by receiving a log from the standby node after the point of time of the last synchronization until the active node is recovered. | 12-03-2015 |
20150355982 | VM AND HOST MANAGEMENT FUNCTION AVAILABILITY DURING MANAGEMENT NETWORK FAILURE IN HOST COMPUTING SYSTEMS IN A FAILOVER CLUSTER - Techniques for virtual machine (VM) management function availability during management network failure in a first host computing system in a cluster are described. In one example embodiment, management network failure is identified in the first host computing system. The management network being coupled to virtual management software in a management server and used for VM and host management functions. VM and host management functions on the first host computing system are then initiated via a failover agent associated with an active host computing system that is connected to the management network in the cluster and a shared storage network. | 12-10-2015 |
20150355983 | Automatic Management of Server Failures - In embodiments of the invention LPARs can be run on any server in a group of servers. Upon detecting a server has failed, each LPAR then running on the failed server is identified, and servers in the group that are available for restarting the identified LPARs are determined. Identified LPARs are assigned to an available server for restarting, wherein each LPAR has a value associated with a specified LPAR priority criterion, and a given LPAR is assigned in accordance with its value. Responsive to assigning the given LPAR to an available server, a specified storage resource is connected for use by the server in association with the given LPAR, wherein the specified storage resource was previously connected for use by the failed server in association with the given LPAR. | 12-10-2015 |
20150363276 | MULTI-SITE DISASTER RECOVERY MECHANISM FOR DISTRIBUTED CLOUD ORCHESTRATION SOFTWARE - Multi-site disaster recovery mechanism performed by the following steps: (i) providing a disaster recovery (DR) system that includes a plurality of sites where each site of the plurality of sites actively serves infrastructure-as-a-service to a set of tenant(s); (ii) for each site of the plurality of sites, determining the following characteristics of the site: workloads that require DR, workloads characteristics, tenants and capabilities; (iii) for each site of the plurality of sites, determining a plurality of associated sites; and (iv) on condition that a disaster occurs which impacts a first site of the plurality of sites, distributing a primary site workload of the first site across the associated sites of the first site. The determination of the plurality of associated sites associated with each site is based upon at least one of the following characteristics: capacity, workloads that require DR, workloads characteristics, tenants and/or capabilities. | 12-17-2015 |
20150370659 | USING STRETCHED STORAGE TO OPTIMIZE DISASTER RECOVERY - Exemplary methods, apparatuses, and systems include receiving a command to perform a failover workflow for a plurality of logical storage devices from a protected site to a recovery site. A first logical storage device within the plurality of logical storage devices is determined to be a stretched storage device. In response to the failover command, a site preference for the first logical storage device is switched from the protected site to the recovery site. The failover includes a live migration of a virtual machine that resides on the first logical storage device. The live migration is performed without interruption to one or more services provided by the virtual machine. The site preference for the first logical storage device is switched prior to performing the live migration of the virtual machine. | 12-24-2015 |
20150370660 | USING STRETCHED STORAGE TO OPTIMIZE DISASTER RECOVERY - Exemplary methods, apparatuses, and systems include determining that at least a portion of a protected site has become unavailable. A first logical storage device within underlying storage of a recovery site is determined to be a stretched storage device stretched across the protected and recovery sites. A failover workflow is initiated in response to the unavailability of the protected site, wherein the failover workflow includes transmitting an instruction to the underlying storage to isolate the first logical storage device from a corresponding logical storage device within the protected site. | 12-24-2015 |
20150370661 | METHOD AND APPARATUS FOR DYNAMIC NODE HEALING IN A MULTI-NODE ENVIRONMENT - Method and apparatus for dynamic Node healing in a Multi-Node environment. A multi-node platform controller hub (MN-PCH) is configured to support multiple nodes through use of dedicated interfaces and components and shared capabilities. Interfaces and components may be configured to be used by respective nodes, or may be configured to support enhanced resiliency as redundant primary and spare interfaces and components. In response to detecting a failing or failing primary interface or component, the MN-PCH automatically performs failover operations to replace the primary with the spare. Moreover, the failover operation is transparent to the operating systems running on the platform's nodes. | 12-24-2015 |
20150370662 | REDUNDANT SYSTEM, REDUNDANCY METHOD, AND COMPUTER-READABLE RECORDING MEDIUM - A primary system includes a first node and a second node that backs up the first node. A secondary system includes a third node and a fourth node that backs up the third node. The first node transmits data update information generated in response to a data update in the first node, to the second node and the third node. The fourth node determines a degree of progress in transactions indicated by data update information obtained through the second node and a degree of progress in transactions indicated by data update information obtained through the third node, identifies data update information indicating a further progressed transaction, and reflects the data update information in stored data of the fourth node. | 12-24-2015 |
20150370663 | REDUNDANT SYSTEM, REDUNDANCY METHOD, AND COMPUTER-READABLE RECORDING MEDIUM - A primary system includes a first node and a second node that backs up the first node. A secondary system includes a third node and a fourth node that backs up the third node. The first node in the primary system inserts, when transmitting data update information generated in response to a data update in the first node to the second node and the third node, one or a plurality of pieces of delimiter information indicating a boundary between update processing units, into both of transmit data. The fourth node in the secondary system specifies, based on the delimiter information, the data update information including update information whose process has progressed further from among the data update information obtained from the second node and the data update information obtained through the third node, and reflects the specified data update information to stored data of the fourth node. | 12-24-2015 |
20150370664 | REDUNDANT SYSTEM AND REDUNDANCY METHOD - When the redundant system operates the second node in place of the first node in the primary system and transmits, to the secondary system, data update information generated according to a data update in the second node, the fourth node the data update information generated according to the data update in the second node from the second node using the second inter-system transfer path. The fourth node changes by changing a direction of the second intra-system transfer, a configuration by which the data update information acquired by the fourth node is acquired by the third node using the second inter-system transfer path. | 12-24-2015 |
20150370665 | NETWORK FAILOVER HANDLING IN MODULAR SWITCHED FABRIC BASED DATA STORAGE SYSTEMS - Systems, methods, apparatuses, and software for data storage systems are provided herein. In one example, a data storage system is provided that includes a first processor configured to establish a network connection with an external system, and receive first storage operations transferred by the external system over the network connection, the first storage operations related to storage and retrieval of data on at least one storage drive. The first processor is configured to transfer information describing the network connection for delivery to at least a second processor. The second processor is configured to identify when the first processor has failed, responsively establish the network connection with the external system based at least on the information describing the network connection, and receive second storage operations transferred by the external system over the network connection. | 12-24-2015 |
20150378853 | Orchestrating High Availability Failover for Virtual Machines Stored on Distributed Object-Based Storage - Techniques are disclosed for orchestrating high availability (HA) failover for virtual machines (VMs) running on host systems of a host cluster, where the host cluster aggregates locally-attached storage resources of the host systems to provide an object store, and where persistent data for one or more of the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store. In one embodiment, a host system in the host cluster executing a HA module determines a VM to be restarted on an active host system in the host cluster. The host system further determines if the VM's persistent data is stored in the object store. If so, the host system adds the VM to a list of VMs to be immediately restarted. Otherwise, the host system checks whether the VM is accessible to the host system by querying a storage layer of the host system configured to manage the object store. | 12-31-2015 |
20160004610 | SYSTEMS AND METHODS FOR FAULT TOLERANT COMMUNICATIONS - Apparatuses, systems and methods are disclosed for tolerating fault in a communications grid. Specifically, various techniques and systems are provided for detecting a fault or failure by a node in a network of computer nodes in a communications grid, adjusting the grid to avoid grid failure, and taking action based on the failure. In an example, a system may include receiving grid status information at a backup control node, the grid status information including a project status, storing the grid status information within the backup control node, receiving a failure communication including an indication that a primary control node has failed, designating the backup control node as a new primary control node, receiving updated grid status information based on the indication that the primary control node has failed, and transmitting a set of instructions based on the updated grid status information. | 01-07-2016 |
20160012009 | USING RDMA FOR FAST SYSTEM RECOVERY IN VIRTUALIZED ENVIRONMENTS | 01-14-2016 |
20160019125 | DYNAMICALLY CHANGING MEMBERS OF A CONSENSUS GROUP IN A DISTRIBUTED SELF-HEALING COORDINATION SERVICE - Systems, methods, and computer program products for managing a consensus group in a distributed computing cluster, by determining that an instance of an authority module executing on a first node, of a consensus group of nodes in the distributed computing cluster, has failed; and adding, by an instance of the authority module on a second node of the consensus group, a new node to the consensus group to replace the first node. The new node is a node in the computing cluster that was not a member of the consensus group at the time the instance of the authority module executing on the first node is determined to have failed. | 01-21-2016 |
20160026542 | Pre-Computation of Backup Topologies in Computer Networks - In one embodiment, a method includes: receiving, at a device of a computer network, a request to build at least part of a backup directed acyclic graph (BDAG) of backup devices for routing traffic within the computer network in case of a power outage, the request comprising at least one requirement specifying to use a device remaining powered after the power outage as a backup device; and in response to receiving the request: identifying a set of backup devices, each of the backup devices fulfilling the at least one requirement; selecting a backup device from the set of backup devices; and synchronizing the device with the backup device according to a backup operation strategy received from the backup device. | 01-28-2016 |
20160026543 | Distributed Storage of Data - Multi-reliability regenerating (MRR) codes are introduced regenerate stored data. An individual regenerating code is used for each message to satisfy a respective reliability requirement for the data. With repair consideration, mixing may be used to improve upon the performance of a coding solution. | 01-28-2016 |
20160034357 | MANAGING BACKUP OPERATIONS FROM A CLIENT SYSTEM TO A PRIMARY SERVER AND SECONDARY SERVER - Provided are techniques for managing backup operations from a client system to a primary server and secondary server. A determination is made at the client system of whether a state of the data on the secondary server permits a backup operation in response to determining that the primary server is unavailable when a force failover parameter is not set. The client system reattempts to connect to the primary server to perform the backup operation at the primary server in response to determining that the state of the data on the secondary server does not permit the backup operation. The client system performs the backup operation at the secondary server in response to determining that the state of the secondary server permits the backup operation. | 02-04-2016 |
20160034364 | CONTROLLING ACCESS OF CLIENTS TO SERVICE IN CLUSTER ENVIRONMENT - First, second, and third sets of addresses are created. The first set includes addresses registered in a name server; both the second and third sets include addresses not registered in the name server and that are disjoint. A first address of a first server that has failed and to which access is to be prohibited is moved from the first to the third set, is removed from the first server, assigned to a second server, and removed from the name server. Usage parameter values of the first address are monitored to determine whether at least one is below a value. If so, the first address is removed from the second server and moved from the third to the second set. Upon access to the first server no longer being prohibited, a second address of the second set is assigned to the first server and added to the name server. | 02-04-2016 |
20160034366 | MANAGING BACKUP OPERATIONS FROM A CLIENT SYSTEM TO A PRIMARY SERVER AND SECONDARY SERVER - Provided are techniques for managing backup operations from a client system to a primary server and secondary server. A determination is made at the client system of whether a state of the data on the secondary server permits a backup operation in response to determining that the primary server is unavailable when a force failover parameter is not set. The client system reattempts to connect to the primary server to perform the backup operation at the primary server in response to determining that the state of the data on the secondary server does not permit the backup operation. The client system performs the backup operation at the secondary server in response to determining that the state of the secondary server permits the backup operation. | 02-04-2016 |
20160036623 | AUTOMATIC CLOUD-BASED DISASTER RECOVERY SYSTEM - Restoration devices in a cloud storage system are paired with source containers associated with a mainframe computer, and series of commands are generated based on the pairings to cause copies of data at the source containers to be stored to the restoration devices. A point-in-time copy of the copy of the data at the source containers may be stored to some restoration devices, and a second copy of the data may be stored to other restoration devices. The restoration devices may be reallocated from inactive source containers. Execution of the commands is monitored, and the commands are modified if the execution of the commands does not satisfy one or more desired conditions. For example, a cycle time associated with copying data to a restoration device may be measured, and if the cycle time exceeds a threshold, the command may be modified. | 02-04-2016 |
20160041889 | OPTIMIZING PLACEMENT PLANS FOR HOST FAILURE IN CLOUD COMPUTING ENVIRONMENTS - Embodiments of the present invention provide systems, methods, and computer program products for optimizing a placement plan. In one embodiment, a method is disclosed in which a request for registration with an external advisor is received. A time to live is received from each external advisor and used to determine an overall timeout period value for a placement engine. After receiving a predictive failure alert, internal and external advisors are ranked according to criteria and advice is received from the qualified advisors. A placement plan is generated based on the advice received from the advisors. | 02-11-2016 |
20160057005 | ENABLING UNIFORM SWITCH MANAGEMENT IN VIRTUAL INFRASTRUCTURE - A method of configuring a logical network in a datacenter is provided. The datacenter includes a plurality of host physical computing devices, a compute manager to configure one or more data compute nodes (DCNs) on virtualization software of each host, and a network manager. The method configures, by the network manager, a logical network. the method provides, by the network manager, a read-only configuration construct of the logical network to the virtualization software of a set of hosts in the plurality of hosts. The method obtains, by the compute manager, the read-only configuration construct of the logical network from the virtualization software of the set of hosts. The method configures, by the compute manager, a plurality of DCNs to connect to the logical network using the read only configuration construct of the logical network. | 02-25-2016 |
20160057208 | Virtual Zones for Open Systems Interconnection Layer 4 Through Layer 7 Services in a Cloud Computing System - Concepts and technologies disclosed herein are directed to virtual zones for Open Systems Interconnection (“OSI”) communication model layers 4-7 services in a cloud computing system. According to one aspect of the concepts and technologies disclosed herein, a cloud computing system can include a hardware resource and a virtual zone. The virtual zone can include a virtual network function (“VNF”) that is executable by the hardware resource. The VNF can support a service that operates within one of layers 4-7 of the OSI communication model. A computing system can detect new subscribers to the service within the virtual zone. The computing system also can determine that a capacity constraint exists within the virtual zone as a result of the new subscribers. The computing system also can home the new subscribers to the further virtual zone so that the further VNF can provide the service to the new subscribers. | 02-25-2016 |
20160062853 | PREVENTING MIGRATION OF A VIRTUAL MACHINE FROM AFFECTING DISASTER RECOVERY OF REPLICA - A storage migration engine and a recovery manager are provided that enable failover operations to be performed in situations where storage migration and array-based replication are involved. The storage migration engine stores information related to storage migrations directly into a source datastore and a destination datastore, which are then replicated over to a recovery site. The recovery manager uses the information stored in the recovered datastores to select which instance of virtual machine data is to be used to fail over to a virtual machine at the recovery site. | 03-03-2016 |
20160062854 | FAILOVER SYSTEM AND METHOD - A failover system, server, method, and computer readable medium are provided. The system includes a primary server for communicating with a client machine and a backup server. The primary server includes a primary session manager, a primary dispatcher a primary order processing engine and a primary verification engine. The method involves receiving an input message, obtaining deterministic information, processing the input message and replicating the input message along with the deterministic information. | 03-03-2016 |
20160062857 | FAULT RECOVERY ROUTINE GENERATING DEVICE, FAULT RECOVERY ROUTINE GENERATING METHOD, AND RECORDING MEDIUM - A fault recovery routine generating device includes a subroutine storage unit which stores subroutines, a precondition storage unit which sores a precondition, a fault combination acceptance unit which accepts a combination of faults that have occurred in components of an information system, a subroutine specification unit which identifies subroutines required for recovery of the components, a fault recovery routine generating unit which acquires the identified subroutines from the subroutine storage unit and links the subroutines to generate a candidate fault recovery routine which is a routine for recovering the information system, a fault recovery time estimation unit which estimates the time required for fault recovery by the candidate fault recovery routine, and a fault recovery routine output unit which outputs the candidate fault recovery routine whose fault recovery time is less than or equal to predetermined time as a fault recovery routine. | 03-03-2016 |
20160070625 | PROVIDING BOOT DATA IN A CLUSTER NETWORK ENVIRONMENT - A computer cluster includes a group of connected computers that work together essentially as a single system. Each computer in the cluster is called a node. Each node has a boot device configured to load an image of an operating system into the node's main memory. Sometimes the boot device of a first node experiences a problem that prevents the operating system from loading. This can affect the entire cluster. Some aspects of the disclosure, however, are directed to operations that determine the problem with the first node's boot device based on a communication sent via a first communications network. Further, the operations can communicate to the first node a copy of boot data from a second node's boot device. The copy of the boot data is sent via a second communications network different from the first communications network. The copy of the boot data can solve the first boot device's problem. | 03-10-2016 |
20160077936 | FAILOVER MECHANISM IN A DISTRIBUTED COMPUTING SYSTEM - The disclosure is directed to failover mechanisms in a distributed computing system. A region of data is managed by multiple region servers. One of the region servers is elected as a “leader” and the remaining are “followers.” The leader serves the read/write requests from a client. The leader writes the data received from the client into the in-memory store and a local write-ahead log (“WAL”), and synchronously replicates the WAL to the followers. A region server designated as an “active” region server synchronizes a distributed data store with the data from the WAL. Active witness followers apply the data from the WAL to their in-memory store while shadow witness followers do not. Different types of servers provide failover mechanisms with different characteristics. A leader is elected based on their associated ranks—higher the rank, higher the likelihood of electing itself as a leader. | 03-17-2016 |
20160077937 | FABRIC COMPUTER COMPLEX METHOD AND SYSTEM FOR NODE FUNCTION RECOVERY - A fabric computer method and system for recovering fabric computer node function. The fabric computer method includes monitoring a processing environment operating on a first Processor and Memory node within the fabric computer complex, detecting a failure of the first Processor and Memory node, and transferring the processing environment from the first Processor and Memory node to a second Processor and Memory node within the fabric computer complex in response to the detection of a failure of the first Processor and Memory node. The fabric computer system includes a first Processor and Memory node, a second Processor and Memory node coupled to the first Processor and Memory node, at least one input/output (I/O) and Networking node coupled to the first and second Processor and Memory nodes, and a fabric manager coupled to the first and second Processor and Memory nodes and the at least one I/O and Networking node. The fabric manager is configured to monitor a processing environment operating on the first Processor and Memory node, to receive notification of a failure of the first Processor and Memory node, and to transfer the processing environment from the first Processor and Memory node to the second Processor and Memory node in response to the detection of a failure of the first Processor and Memory node. | 03-17-2016 |
20160077938 | MANAGING VIOS FAILOVER IN A SINGLE STORAGE ADAPTER ENVIRONMENT - According to one exemplary embodiment, a method for VIOS failover in an environment with a physical storage adapter is provided. The method may include assigning the physical storage adapter to a first VIOS, wherein the physical storage adapter has I/O connectivity to at least one storage device. The method may include configuring a first I/O path between the first VIOS and a second VIOS. The method may include configuring a second I/O path from a client partition to the first VIOS, wherein the second I/O path is set as a primary I/O path. The method may include configuring a third I/O path from the client partition to the second VIOS. The method may include determining the first VIOS is inaccessible. The method may include unassigning the physical storage adapter from the first VIOS. The method may include assigning the physical storage adapter to the second VIOS. | 03-17-2016 |
20160085641 | SYSTEM AND METHOD TO ORCHESTRATE AND CONTROL SEQUENCE OF RECOVERY ACTIONS FOR RESUMPTION OF BUSINESS SERVICES BY DYNAMICALLY CHANGING RECOVERY ACTION SEQUENCE BASED ON FEEDBACK RECEIVED FROM AGENT/OPEN FLOW BASED DEVICES CATERING TO BOTH TRADITIONAL & SOFTWARE DEFINED NETWORKS - Disclosed is a system and method for enabling a SNMP based Network Management System in cooperation with at least one SDN Controller to control sequence of recovery actions and dynamically change the recovery action sequence for a given fault based on the feedback received from an SNMP Agent/Open flow based devices across various systems/platforms for recovering a business service which is achieved by way of Open flow stack enhancements and OF-CONFIG enhancements at the controller end and device end. The present invention is essentially about extending the ability to initiate and perform dynamic recovery actions in a network supporting both the traditional SNMP based management systems & Open flow based SDN Control. | 03-24-2016 |
20160085642 | Fault Tolerant Industrial Automation Control System - A combination of a component-based automation framework, software-based redundancy patterns, and a distributed, reliable runtime manager, is able to detect host failures and to trigger a reconfiguration of the system at runtime. This combined solution maintains system operation in case a fault occurs and, in addition, automatically restores fault tolerance by using backup contingency plans, and without the need for operator intervention or immediate hardware replacement. A fault-tolerant fault tolerance mechanism is thus provided, which restores the original level of fault tolerance after a failure has occurred—automatically and immediately, i.e., without having to wait for a repair or replacement of the faulty entity. In short, the invention delivers increased availability or uptime of a system at reduced costs and complexity for an operator or engineer by adapting automatically to a new environment. | 03-24-2016 |
20160085644 | MULTICAST REPLICATION ENGINE OF A NETWORK ASIC AND METHODS THEREOF - A multicast replication engine includes a circuit implemented on a network chip to replicate packets, mirror packets and perform link switchovers. The multicast replication engine determines whether a switchover feature is enabled. If the switchover feature is not enabled, then the multicast replication engine mirrors the packet according to a mirror bit mask and to a mirror destination linked list. The mirror destination linked list corresponds to a mirroring rule. If the switchover feature is enabled, then the multicast replication engine replicates the packet according to a first live link of a failover linked list. The failover linked list corresponds to a switchover rule. The mirroring rule and the switchover rule are stored in the same table. Copies of the packet are forwarded according to a multicast rule that is represented by a hierarchical linked list with N tiers. | 03-24-2016 |
20160085646 | AUTOMATIC CLIENT SIDE SEAMLESS FAILOVER - A standby database cluster takes on the role of the primary database cluster if the primary database cluster becomes unavailable using the following steps: (i) operating a database management system (DBMS) including an initial primary cluster and a plurality of standby clusters; (ii) communicating to a set of client driver(s) connecting a first application to the initial primary cluster an identity of the plurality of standby clusters; (iii) on condition that the initial primary cluster becomes unavailable, assigning a selected standby cluster of the plurality of standby clusters to be assigned as a new primary cluster in place of the initial primary cluster; and (iv) in response to assignment of the new primary cluster, seamlessly moving the first application from the initial primary cluster to the new primary cluster without any substantial human intervention. | 03-24-2016 |
20160085648 | AUTOMATIC CLIENT SIDE SEAMLESS FAILOVER - A standby database cluster takes on the role of the primary database cluster if the primary database cluster becomes unavailable using the following steps: (i) operating a database management system (DBMS) including an initial primary cluster and a plurality of standby clusters; (ii) communicating to a set of client driver(s) connecting a first application to the initial primary cluster an identity of the plurality of standby clusters; (iii) on condition that the initial primary cluster becomes unavailable, assigning a selected standby cluster of the plurality of standby clusters to be assigned as a new primary cluster in place of the initial primary cluster; and (iv) in response to assignment of the new primary cluster, seamlessly moving the first application from the initial primary cluster to the new primary cluster without any substantial human intervention. | 03-24-2016 |
20160092308 | DISASTER RECOVERY SYSTEM - Disclosed herein is a computer implemented method of performing recovery for a customer server system that has an associated backup of server system data of the customer server system, the method comprising the steps of: receiving a server recovery request at a portal for a rebuild of at least part of the customer server system; and, sending a request from the portal to a cloud-based data centre for on-demand provisioning of cloud-based server resources, wherein the request includes information on the location of at least part of the backup of the server system data to enable the deployment of a rebuild of at least part of the customer server system at the cloud-based data centre. Advantages include a user being able to easily manage disaster recovery testing as well as actual live recovery operations. The use of temporary servers in the cloud is an efficient, and inexpensive, use of resources as the servers can be rented and used only when required. | 03-31-2016 |
20160092322 | SEMI-AUTOMATIC FAILOVER - Semi-automatic failover includes automatic failover by a service provider as well as self-serviced failover by a service consumer. A signal can be afforded by a service provider based on analysis of an incident that affects the service provider. Initiation of self-serviced failover by a service consumer can be predicated on the signal. In one instance, the signal provides information that aids a decision of whether or not to failover. In another instance, the signal can grant or deny permission to perform a self-serviced failover. | 03-31-2016 |
20160092324 | FAST SINGLE-MASTER FAILOVER - Techniques for switching mastership from one service in a first data center to a second (redundant) service in a second data center are provided. A service coordinator in the first data center is notified about the master switch. The service coordinator notifies each instance of the first service that the first service is not a master. Each instance responds with an acknowledgement. After it is confirmed that all instances of the first service have responded with an acknowledgement, a client coordinator in the first and/or second data center is updated to indicate that the second service is the master so that clients may send requests to the second service. Also, a service coordinator in the second data center is notified that the second service is the master. The service coordinator notifies each instance of the second service that the second service is the master. Each instance responds with an acknowledgement. | 03-31-2016 |
20160103698 | Network Virtualization Policy Management System - Concepts and technologies are disclosed herein for providing a network virtualization policy management system. An event relating to a service can be detected, and virtual machines and virtual network functions that provide the service can be identified. A first policy that defines allocation of hardware resources to host the virtual machines and the virtual network functions can be obtained, as can a second policy that defines deployment of the virtual machines and the virtual network functions to the hardware resources. The hardware resources can be allocated based upon the first policy and the virtual machines and the virtual network functions can be deployed to the hardware resources based upon the second policy. | 04-14-2016 |
20160117231 | Complex Network Modeling For Disaster Recovery - A cloud based method and system for the backup and recovery of a computer or computer system is provided with the ability to determine a network model that emulates the network environment of the computer or computer system being backed up. Should a disaster event occur, the network model is used by a disaster recovery computer to construct a virtual network environment that emulates the network environment of the backed up computer or computer system. | 04-28-2016 |
20160132397 | PHASED NETWORK FORMATION FOR POWER RESTORATION - In one embodiment, a device receives a router advertisement message after a power outage event in a network. The device joins the network, in response to receiving the router advertisement message. The device sends a power restoration notification message via the network. The device selectively delays a disconnected node from joining the network. | 05-12-2016 |
20160132407 | FAILOVER SYSTEM AND METHOD OF DECIDING MASTER-SLAVE RELATIONSHIP THEREFOR - A failover system and a method of deciding master-slave relationship therefor are provided. The failover system includes a first electronic device, a second electronic device, a decision circuit and at least two isolation modules. The decision circuit is coupled to the first electronic device and the second electronic device and configured to determine operating states of the first electronic device and the second electronic device and output a first selecting signal and a second selecting signal. The at least two isolation modules are coupled to the first electronic device, the second electronic device, and the decision circuit and configured to switch a master-slave relationship between the first electronic device and the second electronic device according to the first selecting signal and the second selecting signal. | 05-12-2016 |
20160139999 | UNIFIED COMMUNICATIONS MODULE (UCM) - A fault tolerant control system delivers an embedded functional safety core and a distributed control engine with an onboard communication link in an industrial process control environment. The fault tolerant control system includes a process control workstation connected to a first network and a fault tolerant safety controller connected to a second network, wherein a process controller module, a safety controller module and a field device system integration module are co-located on a power interface board. | 05-19-2016 |
20160140000 | SVC CLUSTER CONFIGURATION NODE FAILOVER - An SVC cluster manages a plurality of storage devices and includes a plurality of SVCs interconnected via a network, each SVC acting as a separate node. A new configuration node is activated in response to configuration node failures. The new configuration node retrieves client subscription information about events occurring in storage devices managed by the SVC cluster from the storage devices. In response to events occurring in the storage device managed by the SVC cluster, the new configuration node obtains storage device event information from a storage device event monitoring unit. The new configuration node sends storage device events to clients who have subscribed to this information according to subscription information obtained. The storage device is not installed in the original configuration node. | 05-19-2016 |
20160147621 | MOBILE AGENT BASED MEMORY REPLICATION - Embodiments of the present invention disclose a method, computer program product, and system for memory replication. In one embodiment, in accordance with the present invention, the computer implemented method includes the steps of executing a mobile agent on a server node, wherein the server node is within a cluster of server nodes connected via network communications, capturing a memory state of the server node during operation of the server node, which is captured and stored by the mobile agent, monitoring the server node to determine whether the server node has failed, and responsive to determining that the server node has failed, migrating the mobile agent to an active server node within the cluster of server nodes, wherein the mobile agent carries the captured memory state. | 05-26-2016 |
20160154714 | FAULT TOLERANT ARCHITECTURE FOR DISTRIBUTED COMPUTING SYSTEMS | 06-02-2016 |
20160179639 | SELECTIVELY COUPLING A PCI HOST BRIDGE TO MULTIPLE PCI COMMUNICATION PATHS | 06-23-2016 |
20160188425 | DEPLOYING SERVICES ON APPLICATION SERVER CLOUD WITH HIGH AVAILABILITY - Techniques are disclosed for deploying services in a server cluster environment. Certain techniques are disclosed for deploying services to a cluster based on a replication policy that includes a plurality of configurable parameters. In some embodiments, the configurable parameters (also referred to herein as replication factors) can define a number of nodes to which a service is to be deployed, a number of nodes to which a service is to be prepared, and/or a number of nodes to which a service is replicated. Based on the configurable parameters, the replication policy enables users and/or cluster providers to guarantee different levels of performance and/or reliability. | 06-30-2016 |
20160188427 | FAILURE RESISTANT DISTRIBUTED COMPUTING SYSTEM - A failure resistant distributed computing system includes primary and secondary datacenters each comprising a plurality of computerized servers. A control center selects orchestrations from a predefined list and transmits the orchestrations to the datacenters. Transmitted orchestrations include less than all machine-readable actions necessary to execute the orchestrations. The datacenters execute each received orchestration by referencing a full set of actions corresponding to the received orchestration as previously stored or programmed into the computerized server and executing the referenced full set of actions. At least one of the orchestrations comprises a failover operation from the primary datacenter to the secondary datacenter. Failover shifts performance of task from a set of processing nodes of the primary datacenter to a set of processing nodes of the secondary datacenter, such tasks including managing storage accessible by one or more remote clients and running programs on behalf of remote clients. | 06-30-2016 |
20160188428 | INFORMATION PROCESSING METHOD, COMPUTER-READABLE RECORDING MEDIUM, AND INFORMATION PROCESSING SYSTEM - An information processing method includes executing a processing corresponding to a first request of a terminal apparatus using a first information processing apparatus, when a fault occurs in the first information processing apparatus, transmitting an apparatus information that identifies the first information processing apparatus from a second information processing apparatus to the terminal apparatus, after receiving the apparatus information by the terminal apparatus, discarding data transmitted from the first information processing apparatus to the terminal apparatus, transmitting, from the terminal apparatus to the second information processing apparatus, a response notification indicating that the apparatus information is received by the terminal apparatus, and after receiving the response notification by the second information processing apparatus, executing the processing corresponding to a second request of the terminal apparatus using the second information processing apparatus. | 06-30-2016 |
20160203063 | TRANSMISSION DEVICE, TRANSMISSION SYSTEM, AND TRANSMISSION METHOD | 07-14-2016 |
20160253248 | Systems and Methods for Implementing An Automated Parallel Deployment Solution | 09-01-2016 |
20160378622 | Virtual Machine Recovery On Non-Shared Storage in a Single Virtual Infrastructure Management Instance - Techniques for enabling virtual machine (VM) recovery on non-shared storage in a single virtual infrastructure management server (VIMS) instance are provided. In one set of embodiments, a VIMS instance can receive an indication that a VM in a first cluster of the VIMS instance has failed, and can determine whether the VM's files were being replicated to a storage component of the VIMS instance at the time of the VM's failure. If the VM's files were being replicated at the time of the failure, the VIMS instance can search for and identify a cluster of the VIMS instance and a host system within the cluster that (1) are compatible with the VM, and (2) have access to the storage component. The VIMS instance can then cause the VM to be restarted on the identified host system of the identified cluster. | 12-29-2016 |
20170235653 | VIRTUALIZED FILE SERVER HIGH AVAILABILITY | 08-17-2017 |
20170235654 | VIRTUALIZED FILE SERVER RESILIENCE | 08-17-2017 |
20180024898 | FAULT MONITORING DEVICE, VIRTUAL NETWORK SYSTEM, AND FAULT MONITORING METHOD | 01-25-2018 |
20190146887 | POLICY-DRIVEN HIGH AVAILABILITY STANDBY SERVERS | 05-16-2019 |
20190146889 | Network Failover Handling In Computing Systems | 05-16-2019 |
20190146891 | Methods and Systems for Rapid Failure Recovery for a Distributed Storage System | 05-16-2019 |