Isolate or remove failed node without replacement (e.g., bypassing, re-routing, etc.)

Subclass of:

714 - Error detection/correction and fault detection/recovery

714100000 - DATA PROCESSING SYSTEM ERROR OR FAULT HANDLING

714001000 - Reliability and availability

714002000 - Fault recovery

714003000 - By masking or reconfiguration

714400100 - Of network

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application number	Description	Number of patent applications / Date published
714400200	Isolate or remove failed node without replacement (e.g., bypassing, re-routing, etc.)	74
20110083036	Method and System for Restoring a Server Interface for a Mobile Device - Methods of recovering data lost by a server, and/or facilitating a recovery of data lost by a server, as well as systems for recovering (and/or facilitating the recovery of) data lost by a server, are disclosed herein. In some embodiments, the method includes receiving a data map pertaining to the lost data from one or both of a second server (which can be, for example, a master server) and a mobile device, and obtaining application data from the data map. The method further includes, based upon the application data, accessing one or more of the first mobile device, a second mobile device and a content provider website to obtain at least some of the lost data. Instead, or in addition, the lost data can be obtained from one or more mobile devices or other devices. In some such embodiments, the process can be initiated or governed by the second (e.g., master) server or a mobile device.	04-07-2011
20110093738	ERROR RECOVERY FOR APPLICATION-LEVEL INTERMEDIARIES - Error handling in the intermediation of one-way transacted messages. Rather than receiving an inbound message under a transaction, the intermediary performs a non-destructive exclusive read of the message from the source outside of a transaction. Routing logic is applied against the content of the message to determine a collection of message consumers to which a copy of the inbound message is to be sent. Then, under a transaction, the copy of the message is attempted to be sent to each destination. If a send of the copy fails, the transaction is rolled back, but the failure is recorded such that the same transmission mechanism is not, or is less likely to be, tried again in subsequent attempts. The principles may apply to a single message to be sent under the transaction, or to multiple messages to be sent under a single transaction.	04-21-2011
20110145628	Computer-Implemented Methods and Systems for Testing Online Systems and Content - Computer-implemented methods and systems are provided for scanning web sites and/or parsing web content, including for testing online opt-out systems and/or cookies used by online systems. In accordance with one implementation, a computer-implemented method is provided for testing an opt-out system associated with at least one advertising system that uses cookies. The method includes transmitting a first request to an opt-out system, wherein the first request corresponds to a first test for testing at least one of the opt-out system and an advertising system; receiving a first stream sent in response to the first request; determining a first outcome of the first test based on the first stream; and generating a report based on the first outcome.	06-16-2011
20110161722	Systems and Methods for a Communication Protocol Between a Local Controller and a Master Controller - Systems and methods for local management units in a photovoltaic energy system. In one embodiment, a method implemented in a computer system includes: attempting to communicate on a first active channel with a master management unit from a local management unit that controls a solar module; if communication with the master management unit on the first active channel has not been established, attempting to communicate on a second active channel with the master management unit.	06-30-2011
20110191622	COMPUTER SYSTEM AND BOOT CONTROL METHOD - When a primary computer is taken over to a secondary computer in a redundancy configuration computer system where booting is performed via a storage area network (SAN), a management server delivers an information collecting/setting program to the secondary computer before the user's operating system of the secondary computer is started. This program assigns a unique ID (World Wide Name), assigned to the fibre channel port of the primary computer, to the fibre channel port of the secondary computer to allow a software image to be taken over from the primary computer to the secondary computer.	08-04-2011
20110283134	Server Checking Using Health Probe Chaining - A load balancer sends a probe packet to a first server in a list of servers. Each server in the list sends the probe successively down the list of servers and acknowledges the probe if the server is healthy. The final server in the list sends a signal to the load balancer to indicate that the chain of server probing has completed successfully. In this manner, the load balancer (or other device performing the checking) only needs to send a single probe rather than probe each server in the list separately. Embodiments include serial and recursive acknowledgments from the servers, sending a beacon message upon failure, and other features.	11-17-2011
20110289344	AUTOMATED NODE FENCING INTEGRATED WITHIN A QUORUM SERVICE OF A CLUSTER INFRASTRUCTURE - A quorum service within a cluster infrastructure layer of a cluster environment comprising a plurality of nodes automatically triggers at least one automated fencing operation integrated within the quorum service, to reliably maintain a node usability state of each node of the plurality of nodes indicating an availability of each node to control and access at least one shared resource of the cluster. The quorum service reports the node usability state of each node as a cluster health status to at least one distributed application within an application layer of the cluster environment, to provide a reliable cluster health status of the plurality of nodes to the at least one distributed application for a failover of said at least one shared resource from control by a failed node from among the plurality of nodes to another node from among the plurality of nodes.	11-24-2011
20110296231	DISTRIBUTED HEALTHCHECKING MECHANISM - Some embodiments of a system and a method to perform distributed healthchecking in a cluster system have been presented. For instance, a distributed healthchecking manager executable on a centralized server in a cluster system can assign nodes of the cluster system to at least some of the nodes for healthchecking. Then the distributed healthchecking manager may monitor the nodes performing healthchecking for reports of one or more failed nodes.	12-01-2011
20110296232	Communication system, communication unit, control unit, and controlling method - Whether or not a signal which is transmitted from a current operation side node to a stand-by side node has been disconnected is determined on a communication route for the signal. When the signal is determined to have been disconnected, the network route is switched to the network route used when a client unit and the stand-by side node perform data communication. With this, the switching of the network route when processing having been performed by a node of a system is performed by another node in place of the node can be made quickly.	12-01-2011
20110314325	STORAGE APPARATUS AND METHOD OF DETECTING POWER FAILURE IN STORAGE APPARATUS - A storage apparatus has a physical storage area used by an external apparatus, a drive interface unit, a power supply unit, and a storage controller executing data write processing from the external apparatus to the storage drive and data read processing from the storage drive through the drive interface unit, and a drive control interface unit. The power supply unit inputs power supply information to the drive interface unit. Any one of the processing units acquires the power supply information of the power supply unit through a data network path to the drive interface unit for the data write processing and the data read processing, and determines whether or not a failure occurs in the power supply unit supplying the operation power to the storage drive and the drive interface unit, on the basis of the acquired power supply information.	12-22-2011
20120011390	Method and system for service error connection and error prevention in automatic switched optical network - The present invention discloses a method for error connection and error prevention of service in an Automatically Switched Optical Network (ASON), to resolve the technical problems that the conventional method for error connection and error prevention cannot realize rapid automatic configuration and the efficiency is low and other problems. Through automatically completing the configuration of the error connection and error prevention information of the start node and the end node by the control plane, the present invention overcomes the defect that manual setting is error prone; it is rapid and simple to implement the interaction of the error prevention information between the start node and the end node by protocol exchange. Because the present invention can automatically complete the configuration of the error connection and error prevention information of the start node and the end node by the control plane, and can cite, when an error connection alarm occurs, the error prevent information of the original connection over a new connection of the service under the conditions of different error prevention policies, the present invention enhances the system availability and the service robustness to a great extent.	01-12-2012
20120110371	SELF-RESTARTING NETWORK DEVICES - A method and apparatus for self-monitoring to identify an occurrence of a threshold and rebooting in response to the occurrence of the threshold is provided. In an embodiment, a data processing apparatus comprises one or more processors; logic coupled to the one or more processors and comprising one or more stored sequences of instructions which, when executed by one or more processors, cause the one or more processors to obtain a threshold associated with the apparatus; self-monitor the apparatus to identify an occurrence of the threshold; and self-reboot the apparatus responsive to the occurrence of the threshold.	05-03-2012
20120124412	Systems and Methods of Providing Fast Leader Elections in Distributed Systems of Simple Topologies - Systems and computer-implemented methods of electing a new leader node in distributed systems of simple topologies connecting a plurality of nodes on at least one computer system. The computer-implemented method comprises several steps including at least one node, which detected the absence of a leader, starting a first round for its approval as an Approved Election Initiator. If a quorum accepts the StartElection request during the first round, then the Election Initiator starts a second round to set the leader. If a quorum of all nodes has not been reached during the first round, then the first round fails. The method repeats until a leader is set and is repeated each time a node discovers that the network does not have an active leader. Also provided herein is a computer readable medium having computer executable instructions stored thereon for performing the computer-implemented method.	05-17-2012
20120144229	VIRTUALIZED CLUSTER COMMUNICATION SYSTEM - A method includes executing, in each of a number of nodes of a cluster communication system, a specialized instance of an operating system privileged to control a corresponding hypervisor configured to consolidate one or more VM(s) on a system hardware. The one or more VM(s) is configured to be associated with a non-privileged operating system. The method also includes providing a cluster stack associated with the specialized instance of the operating system on the each of the number of nodes to enable communication between peers thereof in different nodes, and controlling the one or more VM(s) as a cluster resource through the cluster stack.	06-07-2012
20120159233	METHOD OF GENERATING ROUND ROBIN SERVICE ORDER LISTS FOR IMPROVING SERVICE PERFORMANCE DURING SERVER FAILURES - A method controls the routing of service requests to a plurality of servers using a first routing distribution algorithm. The method includes waiting a first period of time for a designated server to respond to a service request, transmitting the service request to the designated server a second time, and waiting a second period to time for the designated server to respond to the service request assigned to the designated server, the second period of time being longer than the first period of time. The method also includes determining that the designated server has failed, rerouting the service request to a different server, and routing the service requests to the plurality of servers using a second routing distribution algorithm.	06-21-2012
20120210159	MULTI-HOST MANAGEMENT SERVER IN STORAGE SYSTEM, PROGRAM FOR THE SAME AND PATH INFORMATION MANAGEMENT METHOD - Management arrangements to: (A) receive plural failure information from plural host computers for a predetermined period; (B) store the failure information; (C) extract one or more of the plural failure information, received from a first host computer among the plural host computers; (D) retrieve the failure information about one path from the extracted failure information, about multiple paths; (E) register the first host computer via refresh information in the memory, refresh information indicating a host computer of which path information is to be updated; (F) send a request to the first host computer to acquire a status of a first path of the first host computer; (G) update a first path information in the plurality of path information of the first host computer, based on the status; and (H) delete the one or more of the plurality of failure information extracted in (C), from the failure reception information.	08-16-2012
20120221886	DISTRIBUTED JOB SCHEDULING IN A MULTI-NODAL ENVIRONMENT - Techniques are described for decentralizing a job scheduler in a distributed system environment. Embodiments of the invention may generally include receiving a job to be performed by a multi-nodal system which includes a cluster of nodes. Instead of a centralized job scheduler assigning the job to a node or nodes, each node has a job scheduler which scans a shared-file system to determine what job to execute on the node. In a job requiring multiple nodes, one of the nodes that joined the multi-nodal job becomes the primary node which then assigns and monitors the job's execution on the multiple nodes.	08-30-2012
20120254653	CHAINCAST METHOD AND SYSTEM FOR BROADCASTING INFORMATION TO MULTIPLE SYSTEMS WITHIN THE INTERNET - A method and system for performing chaincast communication to multiple communication systems within a system of coupled electronic devices. In one implementation the electronic devices can be computer systems and the system of coupled electronic devices includes the Internet. The present invention provides a system wherein a broadcast source communicates broadcast information (e.g., encoded audio radio content, encoded audio/video television content, program instructions, etc.) to a first group of electronic devices. The first group of electronic devices can be instructed by a transmission scheduler to then communicate (e.g., forward) the broadcast information to other electronic devices which devices can also be instructed to communicate to more devices, etc., thereby reducing the bandwidth requirements of the communication channel between the broadcast source and the first group of electronic devices.	10-04-2012
20120266013	LINK AGGREGATION PROTECTION - A method includes detecting, by a first network device, a configuration problem at a second network device, where the first and second network devices are associated with a link aggregation group (LAG) coupling the first and second network devices. The method also includes de-activating, by the first network device, one or more links in the LAG in response to detecting the configuration problem. The method further comprises maintaining at least one of the links in the LAG as an active link and allowing traffic to be forwarded on the active link in the LAG.	10-18-2012
20120272092	FAULT-TOLERANT COMMUNICATIONS IN ROUTED NETWORKS - A method for providing fault-tolerant network communications between a plurality of nodes for an application, including providing a plurality of initial communications pathways over a plurality of networks coupled between the plurality of nodes, receiving a data packet on a sending node from the application, the sending node being one of the plurality of nodes, the data packet being addressed by the application to an address on one of the plurality of nodes, and selecting a first selected pathway for the data packet from among the plurality of initial communications pathways where the first selected pathway is a preferred pathway.	10-25-2012
20120272093	FAULT-TOLERANT COMMUNICATIONS IN ROUTED NETWORKS - A method for providing fault-tolerant network communications between a plurality of nodes for an application, including providing a plurality of initial communications pathways over a plurality of networks coupled between the plurality of nodes, receiving a data packet on a sending node from the application, the sending node being one of the plurality of nodes, the data packet being addressed by the application to an address on one of the plurality of nodes, and selecting a first selected pathway for the data packet from among the plurality of initial communications pathways where the first selected pathway is a preferred pathway.	10-25-2012
20120331334	MULTI-CLUSTER SYSTEM AND INFORMATION PROCESSING SYSTEM - A multi-cluster system includes a plurality of computers; and a plurality of system storage apparatuses each of which is coupled to the plurality of computers; wherein at least one of the plurality of system storage apparatuses includes a first circuit that detects a connection information that includes connection-failure information indicating a connection failure in a connection with at least one of the plurality of computers, and a second circuit that reports the connection information detected by the first circuit to the plurality of computers; and each of the plurality of computers includes a third circuit that receives the connection information from each of the plurality of system storage apparatuses, and a fourth circuit that executes processing to disconnect a system storage apparatus, based on the connection information received by the third circuit.	12-27-2012
20130007504	HIGH AVAILABILITY DATA STORAGE SYSTEMS AND METHODS - Provided are systems and methods for accessing a storage device from a node when a local connection failure occurs between the node and the storage device. A failure is determined to have occurred at a first node access path between a first node and a storage device that prevents an application at the first node from accessing the storage device from the first node access path. An access request is sent from the first node to a second node. The second node has a second node access path to the storage device. A determination is made that the second node can communicate with the storage device. The storage device is accessed by an application at the first node via the second node access path.	01-03-2013
20130013955	METHOD AND SYSTEM FOR EMERGENCY SWITCHING - The disclosure relates to communication technologies and discloses a method and a system for emergency switching. In accordance with the embodiments of the present invention, by configuring a mapped IP address in a previous level network device which can map addresses, the embodiments of the present invention enable corresponding devices in respective service processing subsystems to be backup devices to each other. When a problem occurs in a device of a service processing subsystem, as long as an IP address, which is mapped to the network device, in the previous level network device is mapped to a corresponding device of another service processing subsystem, and the corresponding device in another service processing subsystem acts as a backup device to process a service of the device in which the problem occurs, thus enabling simple and fast starting of a backup device when a problem occurs in the original device.	01-10-2013
20130073894	TECHNIQUES FOR ACHIEVING HIGH AVAILABILITY WITH MULTI-TENANT STORAGE WHEN A PARTIAL FAULT OCCURS OR WHEN MORE THAN TWO COMPLETE FAULTS OCCUR - Techniques for achieving high availability (HA) in a cloud environment are presented. Cloud storage provided to multiple tenants is accessed via a plurality of controllers via a switch. The controllers are organized in a ring and each controller is responsible for detecting failures in adjoining controllers within the ring. Storage services for the tenants are serviced without disruptions even when multiple nodes completely fail at the same time.	03-21-2013
20130080824	DISTRIBUTED JOB SCHEDULING IN A MULTI-NODAL ENVIRONMENT - Techniques are described for decentralizing a job scheduler in a distributed system environment. Embodiments of the invention may generally include receiving a job to be performed by a multi-nodal system which includes a cluster of nodes. Instead of a centralized job scheduler assigning the job to a node or nodes, each node has a job scheduler which scans a shared-file system to determine what job to execute on the node. In a job requiring multiple nodes, one of the nodes that joined the multi-nodal job becomes the primary node which then assigns and monitors the job's execution on the multiple nodes.	03-28-2013
20130103975	METHOD FOR SWITCHING A NODE CONTROLLER LINK, PROCESSOR SYSTEM, AND NODE - Embodiments of the present invention disclose a method for switching an NC link, a processor system, and a node, where the processor system includes more than two nodes capable of communicating with each other, each node includes a node controller NC chip, a host bus adapter HBA apparatus, and at least one CPU, the NC chip is connected to each CPU in a node where the NC chip is located, and the HBA apparatus is connected to each CPU in a node where the HBA apparatus is located; an NC link borne by the NC chip is corresponding to an HBA link borne by the HBA apparatus. By using an HBA apparatus to deploy a redundant link, the cost of deploying the redundant link is reduced effectively under a premise of ensuring the reliability of the processor system.	04-25-2013
20130111259	CONNECTION CONTROL APPARATUS, STORAGE SYSTEM, AND CONTROL METHOD OF CONNECTION CONTROL APPARATUS	05-02-2013
20130124911	COMMUNICATION SYSTEM WITH DIAGNOSTIC CAPABILITIES - A first component, executing in a first data processing system, receives, over a data communication network using a first adapter, a first diagnostic heartbeat packet from a second adapter in a second data processing system. The first heartbeat packet comprises a header, a set of heartbeat parameters, and a set of diagnostic attributes. The first component determines, based on a set of values corresponding to the set of diagnostic attributes, that a soft network error condition exists in the data communication network. The soft network error condition is a network error condition that adversely affects the transmission of packets having certain properties in the data communication network. The first component stores the set of values in a state information record associated with the first component and re-routes data traffic from one link to a different link between the first and the second data processing systems.	05-16-2013
20130124912	SYNCHRONIZING A DISTRIBUTED COMMUNICATION SYSTEM USING DIAGNOSTIC HEARTBEATING - A first component, executing using a processor and a memory in a first data processing system, receives a diagnostic heartbeat packet from a second component executing in a second data processing system, wherein the diagnostic heartbeat packet is a packet comprising a header, a set of heartbeat parameters, and a set of diagnostic attributes. The first component determines, using a value of a diagnostic attribute in the diagnostic heartbeat packet, that a first communication link between the first and the second data processing systems is usable but includes a soft network error, wherein a soft network error condition is a network error condition that adversely affects transmission of packets having certain properties in the data communication network. The first component re-routes a synchronization message from the first component to the second component using a second communication link between the first and the second data processing systems.	05-16-2013
20130132762	AUTOMATED NODE FENCING INTEGRATED WITHIN A QUORUM SERVICE OF A CLUSTER INFRASTRUCTURE - A quorum service within a cluster infrastructure layer of a cluster environment comprising a plurality of nodes automatically triggers at least one automated fencing operation integrated within the quorum service, to reliably maintain a node usability state of each node of the plurality of nodes indicating an availability of each node to control and access at least one shared resource of the cluster. The quorum service reports the node usability state of each node as a cluster health status to at least one distributed application within an application layer of the cluster environment, to provide a reliable cluster health status of the plurality of nodes to the at least one distributed application for a failover of said at least one shared resource from control by a failed node from among the plurality of nodes to another node from among the plurality of nodes.	05-23-2013
20130132763	NETWORK DISRUPTION PREVENTION WHEN VIRTUAL CHASSIS SYSTEM UNDERGOES SPLITS AND MERGES - A method performed by network devices that includes operating in a normal mode, where the network devices form a virtual chassis that corresponds to a single logical network device; detecting when a failure within the virtual chassis occurs; executing a splitting process to form one or more new virtual chassis in correspondence to the failure; determining whether one of the one or more new virtual chassis operates as a functioning virtual chassis based on whether at least one of a set of criteria is satisfied, where the functioning virtual chassis operates according to resources configured for the virtual chassis; and operating as a nonfunctioning virtual chassis when it is determined that the one of the one or more virtual chassis does not satisfy the at least one of the set of criteria, where the nonfunctioning virtual chassis operates in a pass-through mode.	05-23-2013
20130138995	DYNAMIC HYPERVISOR RELOCATION - A method for managing multiple nodes hosting multiple memory segments, including: identifying a failure of a first node hosting a first memory segment storing a hypervisor; identifying a second memory segment storing a shadow of the hypervisor and hosted by a second node; intercepting, after the failure, a hypervisor access request (HAR) generated by a core of a third node and comprising a physical memory address comprising multiple node identification (ID) bits identifying the first node; modifying the multiple node ID bits of the physical memory address to identify the second node; and accessing a location in the shadow of the hypervisor specified by the physical address of the HAR after the multiple node ID bits are modified.	05-30-2013
20130138996	NETWORK AND EXPANSION UNIT AND METHOD FOR OPERATING A NETWORK - A network, in particular an Ethernet network, contains as network elements at least two network components that are interconnected by a network transmission line. Accordingly, at least one expansion unit having two external ports is disposed in the network line for extending the scope thereof, wherein the expansion unit forwards a failure of the network transmission line at one of the ports thereof to a port of the next subsequent network element.	05-30-2013
20130198558	Dual Adjacency Between Edge Devices at a Network Site - Devices, methods and instructions encoded on computer readable medium for implementation of a dual-adjacency between edge devices of a network site. A first edge device comprises one or more local interfaces configured for communication, via a local network, with one or more network devices co-located in a first network site. The first edge device also comprises one or more overlay interfaces configured for communication, via a core network, with one or more network devices located in one or more other network sites connected to the core network. The first edge device comprises a processor configured to establish, via at least one of the local interfaces, a site communication channel with a second edge device co-located in the first network site. The processor is further configured to establish an overlay communication channel, via at least one of the overlay interfaces, with the second edge device.	08-01-2013
20130227335	RECOVERY ESCALATION OF CLOUD DEPLOYMENTS - Methods and systems for escalating component failures in a cloud are provided. A cloud controller of a cloud receives an indication that a collection of virtual machines of the first cloud has failed based on a collection of virtual machines escalation policy. The cloud controller initiates relocating the collection of virtual machines to a second cloud.	08-29-2013
20130262914	CLOUD SYSTEM AND METHOD FOR MONITORING AND HANDLING ABNORMAL STATES OF PHYSICAL MACHINE IN THE CLOUD SYSTEM - A cloud system and a method for monitoring and handling abnormal states of physical machines in the cloud system are disclosed. Each physical machine of the cloud system respectively executes a daemon program for monitoring operation states of the physical machine and providing the operation states to a management terminal in the cloud system. When the management terminal determines that any physical machine is having abnormal operation states, the management terminal provides a control instruction to the cabinet of the physical machine having abnormal operation states. The physical machine having abnormal operation states is compulsorily ejected from the cabinet. Thus, it is convenient to the administrator when replacing the physical machine having abnormal operation states onsite by shortening the time looking for the faulted physical machine.	10-03-2013
20130268799	Automatically Scaled Network Overlay with Heuristic Monitoring in a Hybrid Cloud Environment - Techniques are provided for a management application in a first virtual network to start a first cloud gateway in the first virtual network. First messages are sent to a second virtual network, the first messages comprising information configured to start a second cloud gateway and a first virtual switch in the second virtual network. A connection is established between the first cloud gateway and the second cloud gateway, where the first cloud gateway, the second cloud gateway, and the first virtual switch form a first scalable cloud network element. One or more second messages are sent to the second virtual network, the one or more second messages comprising information configured to start a virtual machine and a first virtual machine interface configured to allow the virtual machine to access processing resources in the second virtual network. Data are stored that associates the virtual machine with the first virtual switch.	10-10-2013
20130339781	High Availability Conferencing Architecture - Providing high availability multi-way conferencing. Separate signaling and media components may be provided within an MCU or among a cluster of MCUs. A signaling server may control signaling aspects of a conference while a media server may provide media support for the conference. In the event of media server failure, the signaling server may assign a new media server to provide media support for the conference. A backup signaling server may also monitor the signaling server and may provide signaling support for the conference in the event of signaling server failure.	12-19-2013
20130346788	METHOD AND SYSTEM TO ENABLE RE-ROUTING FOR HOME NETWORKS UPON CONNECTIVITY FAILURE - A method implemented by a Broadband Network Gateway (BNG) of an Internet service provider to provide accessibility to a wide area network for a Residential Gateway (RG) upon a failure of a wireline connectivity between the BNG and the RG, the method including receiving a failure detect message indicating a connectivity failure at the BNG from the RG, deciding whether to re-route traffic by the BNG, sending a failure acknowledge message by the BNG to the RG notifying the RG that re-routing has been initiated, sending a traffic re-route request message by the BNG to a Packet Data Network Gateway (PDN GW) of a Long-Term Evolution (LTE) network requesting the PDN GW to re-route traffic, receiving a traffic re-route acknowledgement by the BNG from the PDN GW, and re-routing traffic between the RG and the BNG through the PDN GW by the BNG.	12-26-2013
20140025985	COMMUNICATION CONTROL DEVICE AND COMMUNICATION CONTROL METHOD - A connection node is included in a connecting part of a plurality of rings in a ring network. The connection node includes a failure detecting unit, an optical-signal processing unit, an ODU switch, and an optical-signal processing unit. The failure detecting unit detects failure in the connecting part. The optical-signal processing unit receives data transmitted from another node on a ring to which the connection node belongs. Upon detection of the failure, the ODU switch determines whether to pass the data or return the data in reverse direction from the connection node depending on a destination to transfer the received data, and sets a transmission path of the data based on a result of the determination. The optical-signal processing unit transfers the data in accordance with the set transmission path.	01-23-2014
20140053014	HANDLING INTERMITTENT RECURRING ERRORS IN A NETWORK - Embodiments relate to a computer for transmitting data in a network. The computer includes at least one data transmission port configured to be connected to at least one storage device via a plurality of paths of a network. The computer further includes a processor configured to detect recurring intermittent errors in one or more paths of the plurality of paths and to disable access to the one or more paths based on detecting the recurring intermittent errors.	02-20-2014
20140095922	SYSTEM AND METHOD OF FAILOVER FOR AN INITIATED SIP SESSION - An initial SIP message is sent to establish a first SIP communication session from a first SIP device. The initial SIP message is sent via a first of a plurality of session managers to a second SIP device. After receiving the initial SIP message at the second SIP device and before ending the first SIP communication session, either the first or second SIP device sends a second SIP message. The second SIP message is sent to the first of the plurality of session managers. Either the first or second SIP devices detects that a response SIP message to the sent second SIP message was not received within a defined time period. In response to detecting that the SIP response message was not received within the defined time period, either the first or second SIP device resends the second SIP message to a second one of the plurality of session managers.	04-03-2014
20140095923	FINAL FAULTY CORE RECOVERY MECHANISMS FOR A TWO-DIMENSIONAL NETWORK ON A PROCESSOR ARRAY - Embodiments of the invention relate to faulty recovery mechanisms for a two-dimensional (2-D) network on a processor array. One embodiment comprises a processor array including multiple processors core circuits, and a redundant routing system for routing packets between the core circuits. The redundant routing system comprises multiple switches, wherein each switch corresponds to one or more core circuits of the processor array. The redundant routing system further comprises multiple data paths interconnecting the switches, and a controller for selecting one or more data paths. Each selected data path is used to bypass at least one component failure of the processor array to facilitate full operation of the processor array.	04-03-2014
20140108854	PROVIDING MULTIPLE IO PATHS IN A VIRTUALIZED ENVIRONMENT TO SUPPORT FOR HIGH AVAILABILITY OF VIRTUAL MACHINES - High availability of a virtual machine is ensured even when all of the virtual machine's IO paths fail. In such a case, the virtual machine is migrated to a host that is sharing the same storage system as the current host in which the virtual machine is being executed and has at least one functioning IO path to the shared storage system. After execution control of the virtual machine is transferred to the new host, IO operations from the virtual machine are issued over the new IO path.	04-17-2014
20140143589	METHOD FOR MANAGING PATH OF OSEK NETWORKS - Disclosed herein is a method of managing the path of an OSEK network. The method of managing the path of an OSEK network includes step S	05-22-2014
20140149782	METHOD AND APPARATUS FOR FACILITATING PROCESS RESTART IN A MULTI-INSTANCE IS-IS SYSTEM - A method and apparatus for facilitating process restart in an IS-IS router that includes an active router processor (RP) module for supporting an active IS-IS process instance as well as one or more dormant instances of the active IS-IS process. Routing database information maintained by the active IS-IS process is synchronized to one or more corresponding databases associated with the dormant instances. Responsive to a control signal, one of the dormant instances may be activated as the new active IS-IS process instance on the active RP module, wherein the contents of the database corresponding to the newly activated instance are used for continuing to maintain routing functionality.	05-29-2014
20140149783	METHODS AND APPARATUS FACILITATING ACCESS TO STORAGE AMONG MULTIPLE COMPUTERS - Multiple computers in a cluster maintain respective sets of identifiers of neighbor computers in the cluster for each of multiple named resource. A combination of the respective sets of identifiers define a respective tree formed by the respective sets of identifiers for a respective named resource in the set of named resources. Upon origination and detection of a request at a given computer in the cluster, a given computer forwards the request from the given computer over a network to successive computers in the hierarchical tree leading to the computers relevant in handling the request based on use of identifiers of neighbor computers. Thus, a combination of identifiers of neighbor computers identify potential paths to related computers in the tree.	05-29-2014
20140157041	DISTRIBUTED AVIONICS SYSTEM AND METHOD FOR BACKUP HANDLING IN AN AVIONICS SYSTEM - The present invention relates to a distributed avionics system (	06-05-2014
20140245059	HYBRID REDUNDANCY FOR ELECTRONIC NETWORKS - Aspects of a method and system for hybrid redundancy for electronic networks are provided. A first line card may comprise a first instance of a network layer circuit, a first instance of a physical layer circuit, and an interface to a data bus (e.g., an Ethernet bus) for communicating with a second line card. In response to detecting a failure of the first instance of the network layer circuit, the first instance of the physical layer circuit may switch from processing of a signal received via the first instance of the network layer circuit to processing of a signal received via the interface. The system may comprise a second line card. The second line card may comprise a second instance of the network layer circuit. The second instance of the network layer circuit may be coupled to the data bus.	08-28-2014
20140304544	NETWORK SYSTEM, NODE DEVICE GROUP, SENSOR DEVICE GROUP, AND METHOD FOR TRANSMITTING AND RECEIVING SENSOR DATA - Each node device has a sensor data saving information list storage section for storing a sensor data saving information list indicates a proper node device for saving each of sensor data among node devices according to an attribute of the sensor data. A sensor data arrangement section transfers each of the sensor data saved in sensor data storage sections of the node devices to the proper node device for saving the sensor data based on the sensor data saving information list.	10-09-2014
20140310555	PHYSICAL DOMAIN ERROR ISOLATION AND RECOVERY IN A MULTI-DOMAIN SYSTEM - The disclosed embodiments disclose techniques for performing physical domain error isolation and recovery in a multi-domain system, where the multi-domain system includes two or more processor chips and one or more switch chips that provide connectivity and cache-coherency support for the processor chips, and the processor chips are divided into two or more distinct domains. During operation, one of the switch chips determines a fault in the multi-domain system. The switch chip determines an originating domain that is associated with the fault, and then signals the fault and an identifier for the originating domain to its internal units, some of which perform clearing operations that clear out all traffic for the originating domain without affecting the other domains of the multi-domain system.	10-16-2014
20150026507	TRANSPORT CONTROL SERVER, NETWORK SYSTEM AND TRANSPORT CONTROL METHOD - It is intended to shorten the time required for a path recalculation and a path switching upon occurrence of a failure. A path generation unit of a transport control server (TCS) S-	01-22-2015
20150067385	INFORMATION PROCESSING SYSTEM AND METHOD FOR PROCESSING FAILURE - An information processing system includes a plurality of nodes and a shared memory connected to the plurality of nodes. Each of the nodes includes a plurality of functional circuits, a control device, and a register configured to store a plurality of interrupt factors that occur in the plurality of functional circuits. And The control device in one node among the plurality of nodes receives the interrupt factor in each register of a plurality of other nodes in response to an occurrence of the interrupt factor, extracts an interrupt factor to be detected as a failure among the received interrupt factors, specifies a fail node according to an extraction result, and, after suppressing access to the shared memory by the fail node, controls to separate the fail node from the information processing system on basis of log information received from the plurality of other nodes.	03-05-2015
20150082078	METHOD AND APPARATUS FOR ISOLATING A FAULT IN A CONTROLLER AREA NETWORK - A controller area network (CAN) includes a plurality of CAN elements comprising a communication bus and a plurality of controllers. A method for monitoring includes periodically determining vectors wherein each vector includes inactive ones of the controllers detected during a filtering window. Contents of the periodically determined vectors are time-filtered to determine a fault record vector. A fault on the CAN is isolated by comparing the fault record vector and a fault signature vector determined based upon a network topology for the CAN.	03-19-2015
20150295750	METHOD AND SYSTEM FOR MANAGING INTERCONNECTION OF VIRTUAL NETWORK FUNCTIONS - A method and apparatus is disclosed herein for use of a connectivity manager and a network infrastructure including the same. In one embodiment, the network infrastructure comprises one or more physical devices communicably coupled into a physical network infrastructure or via the overlay provided by the physical servers; and a virtual network domain containing a virtual network infrastructure executing on the physical network infrastructure. In one embodiment, the virtual network domain comprises one or more virtual network functions connected together through one or more links and executing on the one or more physical devices, and one or more interfaces coupled to one or more network functions via one or more links to communicate data between the virtual network domain and at least one of the one or more physical devices of the physical network infrastructure while the virtual network domain is isolated from other virtual infrastructures executing on the physical network infrastructure.	10-15-2015
20150347258	METHOD AND APPARATUS FOR SHORT FAULT DETECTION IN A CONTROLLER AREA NETWORK - A controller area network (CAN) includes a CAN bus having a CAN-H wire, a CAN-L wire, and a pair of CAN bus terminators located at opposite ends of the CAN bus. The CAN further includes a plurality of nodes including controllers wherein at least one of the controllers is a monitoring controller. The monitoring controller includes a CAN monitoring routine for detecting a wire short fault in the CAN bus and its location.	12-03-2015
20160034363	METHOD FOR HANDLING FAULTS IN A CENTRAL CONTROL DEVICE, AND CONTROL DEVICE - The invention relates to a method for handling faults in a central control device, wherein the control device comprises a distributed computer system (	02-04-2016
20160041890	PARALLEL COMPUTER SYSTEM AND CONTROL METHOD FOR PARALLEL COMPUTER SYSTEM - A parallel computer system includes a parallel computer including nodes connected via communication routes and respectively executing calculations, and a control device to allocate a job to a predetermined number of nodes. The control device includes a job allocation processor to allocate, to a peripheral region of first N-dimensional job nodes allocated with a first job, any of an empty node, a zero-dimensional job node, and a node at a side or a surface with one node length of M-dimensional job nodes, N=<1 and M	02-11-2016
20160098327	BYPASSING FAILED HUB DEVICES IN HUB-AND-SPOKE TELECOMMUNICATION NETWORKS - In an embodiment, a method comprises using a first hub device: establishing one or more secure connections with one or more spoke devices logically arranged as spokes with respect to a data processing system; generating and sending via a high-speed link a hub probe to a second hub device; in response to determining that the second hub device is nonresponsive, transmitting, to the one or more spoke devices a first communication indicating that the second hub device is nonresponsive; using a spoke device, receiving the first communication indicating that the second hub device is nonresponsive; determining whether the spoke device has established a secure connection with the second hub device; in response to determining that the spoke device has established the secure connection with the second hub device, selecting a third hub device, establishing a secure connection with the third hub device, and communicating with the third hub device.	04-07-2016
20160154717	FAULTY CORE RECOVERY MECHANISMS FOR A THREE-DIMENSIONAL NETWORK ON A PROCESSOR ARRAY	06-02-2016
20160162377	Access Control Method and System, and Access Point - An access control method and system and an access point. When a fault occurs in an access controller (AC), an access point (AP) configures a network-layer interface of the AP according to an Internet Protocol (IP) address and a media access control (MAC) address of the AC that are obtained by means of pre-learning, and then the AP routes a received packet to a Web server on a wireless local area network (WLAN) using the configured network-layer interface, where the packet is used by a first station (STA) to request to access an external server. Therefore, interconnection and interworking among wireless local area networks are implemented, and a breakdown of a wireless local area network caused in a centralized network architecture due to occurrence of a fault in an AC is avoided.	06-09-2016
20160378606	FAST CONVERGENCE FOR FAILURES OF LARGE SCALE VIRTUAL ETHERNET SEGMENTS IN EVPN AND PBB-EVPN - Systems, methods, and computer-readable media for fast convergence for virtual ethernet segments in EVPN and PBB-EVPN networks are disclosed. A first provider edge (PE) device can receive one or more advertising messages corresponding to one or more virtual ethernet segments, wherein each of the one or more advertising messages can include a port identifier. The first PE device maintains a table including the one or more virtual ethernet segments and the corresponding port identifier. The first PE device can receive a failure message from a second PE device that identifies a first port on the second PE device, and identifies, based on the table, at least one affected virtual ethernet segment that is associated with the first port. The first PE device can remove any routes that are associated with the at least one affected virtual ethernet segment and trigger mass designated-forwarding election for impacted virtual ethernet segments.	12-29-2016
714400210	Reintegrate node back into network	11
20110087919	MANAGING AVAILABILITY OF A COMPONENT HAVING A CLOSED ADDRESS SPACE - Systems, methods and articles of manufacture are disclosed for managing availability of a component executing in a distributed system. The component may have an address space closed to the distributed system. In one embodiment, the component may be initiated. A state of the component may be analyzed to determine the availability of the component. The determined availability may be transmitted to the distributed system. The component may also be restarted responsive to a request from the distributed system to restart the component.	04-14-2011
20110179305	PROCESS FOR SECURE BACKSPACING TO A FIRST DATA CENTER AFTER FAILOVER THROUGH A SECOND DATA CENTER AND A NETWORK ARCHITECTURE WORKING ACCORDINGLY	07-21-2011
20110320859	METHOD AND SYSTEM FOR INTERFERENCE DETECTION AND MITIGATION - In a method for adjusting modulation on a network, a modulation profile of a network node on the network is set a specified density. A plurality of messages that are received at the network node are monitored on an ongoing basis. The modulation profile of the network node is updated continually based on the monitored messages. A determination is made that a predetermined class of messages is received incorrectly at the network node. The network node is disconnected from the network based on the incorrectly received predetermined class of messages and is reconnected to the network to initiate the network node on the network.	12-29-2011
20120042198	LPAR CREATION AND REPAIR FOR AUTOMATED ERROR RECOVERY - Various embodiments for automated error recovery in a computing storage environment by a processor device are provided. In one embodiment, pursuant to performing one of creating a new and rebuilding an existing logical partition (LPAR) operable in the computing storage environment by a hardware management console (HMC) in communication with the LPAR, at least one failure scenario is evaluated by identifying error code. If a failure is caused by an operation of the HMC and a malfunction of a current network connection, a cleanup operation is performed on at least a portion of a current HMC configuration, an alternative network connection to the current network connection is made, and a retry operation is performed.	02-16-2012
20120042199	FAILOVER BASED ON SENDING COMMUNICATIONS BETWEEN DIFFERENT DOMAINS - A first communication device detects a failure on a communication channel of a primary network. Based on the failure, the first communication device sends communications directed to a second communication device to a secondary network. A secondary network domain controller sends the communications to the primary network via a different communication channel. This can be done by looking at an identifier such as an IP address. This can also happen based on being able to communicate between the secondary network and the primary network via the different communication channel. The primary network then sends the communication to the second communication device. In addition, communications from the second communication device are routed in a similar manner to the first communication device. Sending the communications back into the primary network allows users to have access to features not provided by the secondary network.	02-16-2012
20120210161	ROUTER SYNCHRONIZATION - Example systems and methods associated with router synchronization are described. One example method includes reducing a likelihood that a first network device will be favored over a peer device as a router. This likelihood may be increased after the first network device has received a threshold amount of routing information from the peer device. This may allow the first network device to begin performing non-routing related tasks after it starts up without causing interruption of data streams for which the first network device does not have current routing information.	08-16-2012
20140040659	SELECTION OF ONE OF FIRST AND SECOND LINKS BETWEEN FIRST AND SECOND NETWORK DEVICES - Embodiments herein relate to selection of one of first and second links between first and second network devices. The first link is to transmit the traffic between the first and second network devices directly and the second link is to transmit the traffic between the first and second network devices through a network appliance.	02-06-2014
20140082409	AUTOMATED NODE FENCING INTEGRATED WITHIN A QUORUM SERVICE OF A CLUSTER INFRASTRUCTURE - A quorum service detects liveness failures of at least two failed nodes in a domain of a cluster infrastructure layer of a cluster environment within a limited time frame and adds the at least two failed nodes to a list of nodes set to pending to be fenced by a group leader node. The quorum service determines whether the at least two failed nodes include the group leader node. The quorum service, responsive to the at least two failed nodes not including the group leader node, triggers the group leader node to trigger at least one fencing operation to fence the at least two failed nodes in the list of nodes. The quorum service, responsive to the at least two failed nodes including the group leader node, sets a new node as the group leader node and triggers the new node set as the group leader node to trigger the at least one fencing operation to fence the at least two failed nodes in the list of nodes.	03-20-2014
20150370658	ALTERNATIVE PORT ERROR RECOVERY WITH LIMITED SYSTEM IMPACT - Various embodiments for troubleshooting a network device in a computing storage environment by a processor. In response to an error in a specific port, an alternative error recovery operation is initiated on the port by performing at least one of initiating a silent recovery operation by reloading a failed instruction, taking the port offline, cleaning up any active transactions associated with the port, performing a hardware reset operation port, and bringing the port online.	12-24-2015
20160041888	LINK STATE RELAY FOR PHYSICAL LAYER EMULATION - One embodiment of the present invention provides a fault-management system. During operation, the system identifies a failure at a remote location associated with a communication service. The system then determines a local port used for the communication service, and suspends the local port, thereby allowing the failure to be detected by a device coupled to the local port.	02-11-2016
20160179607	FAILURE MANAGEMENT FOR ELECTRONIC TRANSACTIONS	06-23-2016

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Isolate or remove failed node without replacement (e.g., bypassing, re-routing, etc.)

Subclass of:

714 - Error detection/correction and fault detection/recovery

714100000 - DATA PROCESSING SYSTEM ERROR OR FAULT HANDLING

714001000 - Reliability and availability

714002000 - Fault recovery

714003000 - By masking or reconfiguration

714400100 - Of network

Patent class list (only not empty are listed)

Deeper subclasses: