Entries |
Document | Title | Date |
20080201601 | Method and Apparatus for Elimination of Faults of a Data Processing System - A method for fault handling of a data processing unit is disclosed. The method includes automatic acquisition of input information and/or output information of a user at at least one user interface of the data processing unit; automatic detection of a fault message that indicates a fault of the data processing unit; transmission of the acquired fault message together with the input information and/or the output information to a fault handling center; and evaluation of the transmitted fault message in the fault handling center. | 08-21-2008 |
20080209254 | METHOD AND SYSTEM FOR ERROR RECOVERY OF A HARDWARE DEVICE - A method and system for error recovery of a hardware device is provided. The method includes detecting a target hard error indication from the hardware device by comparing the hard error indication to signatures of hard error indications which indicate a temporary failing and modifying the reported error to a stalling indication. The hardware device is allowed to recover in a predefined time period or by issuing one or more resets, or both. A hard error indication usually instigates an external error recovery of the hardware device and the method temporarily stalls such external error recovery. | 08-28-2008 |
20080209255 | METHOD AND SYSTEM FOR THE SERVICE AND SUPPORT OF COMPUTING SYSTEMS - The invention describes an end-user-initiated method and system for managing failure in a host computing system. The embodiments of the invention describe an embedded management/diagnostics system that operates independently from the failed computing system and includes the locating and connecting of an appropriate technical service provider for correcting the problem in the failed computing system. | 08-28-2008 |
20080209256 | MEMORY ARRAY REPAIR WHERE REPAIR LOGIC CANNOT OPERATE AT SAME OPERATING CONDITION AS ARRAY - Memory array repair where the repair logic cannot operate at the same operating condition as the memory array is disclosed. In one embodiment, a test is run with the memory array configured in a first operating condition that repair logic for the memory array cannot achieve, and test data is accumulated from the test in the memory array. The memory array is then read with the memory array configured in a second operating condition that the repair logic can achieve using the test data from the test at the first operating condition. As a result, repairs can be achieved even though the repair logic is incapable of operating at the same condition as the memory array. A method, test unit and integrated circuit implementing the testing are disclosed. | 08-28-2008 |
20080222447 | PREVENTION OF FRAME DUPLICATION IN INTERCONNECTED RING NETWORKS - A method for communication includes, in a communication network that includes multiple ring nodes arranged in at least first and second ring networks that are connected by two or more of the ring nodes serving as interconnect nodes, accepting at the two or more interconnect nodes respective copies of a data packet, which is sent from a source user node connected to the first ring network. | 09-11-2008 |
20080222448 | SYSTEM, METHOD AND PROGRAM PRODUCT FOR RECOVERING FROM A FAILURE - System, method and computer program product for recovering from a failure of a computing device. Start up of a first component of the device is monitored and a determination is made whether the first component has started successfully. If so, a second, higher level component of the device is started. Operational data received from the second component is monitored. If the operational data falls outside of an operational boundary, an action is performed on the second component to enable the second component to operate within a preferred operational boundary. If the first component does not start up successfully, a determination is made if start up of the first component is critical to operation of the second component. If so, a corrective action is performed relative to the first component and afterwards, an attempt is made to start up the second component. | 09-11-2008 |
20080229140 | System and method of disaster recovery - In a DR system, from the viewpoint of device cost, when search is not carried out, a physical application where log recovery is available by inexpensive DB appliance server is adopted. Further, a local mirror operation at a secondary site is not carried out. Furthermore, from the viewpoint of operation, by a log apply function unit, the tendencies of a log application and operations are monitored, and a search process is accepted according to the progress conditions of the log application. When the log application does not catch up sufficiently, the search is not accepted. Moreover, when a consistency guarantee of a secondary DB is made, not transactions in process at the moment of search instruction are undone (rolled back), but only transactions in process at the moment of a search instruction are redone (rolled forward). | 09-18-2008 |
20080235531 | Apparatus and Computer Program Product for Testing Ability to Recover From Cache Directory Errors - A method, apparatus, and computer program product are disclosed for testing a data processing system's ability to recover from cache directory errors. A directory entry is stored into a cache directory. The directory entry includes an address tag and directory parity that is associated with that address tag. A cache entry is stored into a cache that is accessed using the cache directory. The cache entry includes information and cache parity that is associated with that information. The directory parity is altered to imply bad parity. The bad parity implies that the address tag that is associated with this parity is invalid. The information included in the cache entry is altered to be incorrect information. However, although the information is now incorrect, the cache parity continues to imply good parity which implies that the data is good. This good parity implies that the information that is associated with the parity is valid, even though it is not. The data processing system's ability to recover from errors is tested using the directory entry and the cache entry. | 09-25-2008 |
20080250264 | System for Adaptive Action Plan Compilation Based on Error Reporting - A database of action plans carried out by a service provider is provided that stores the action plan as a series of action codes as well as the associated information such as error code, error type and whether the action plan resolved the problem. When an error occurs and is reported automatically, the database is searched for the error that occurred. Action plans as well as success rates are collected with most probable solutions being presented first. Each action code in the action plan corresponds to a particular point in maintenance documentation that is stored, e.g., on a management console, at the customer location. After reporting the error, the management console receives action plans for the error based on actual service reports as well as action plans suggested by documentation. When a service representative accesses the management console for information about the error, appropriate documentation is presented for each step in the action plan, allowing the service representative to follow along the suggested action plans and associated maintenance documentation onsite. | 10-09-2008 |
20080263385 | Memory Device with Error Correction Based on Automatic Logic Inversion - A memory device comprises a memory array and error correction circuitry coupled to the memory array. The error correction circuitry is configured to identify, in a data word retrieved from the memory array, at least one bit position corresponding to a predetermined defect location in the memory array, and to generate a corrected data word by automatically inverting a logic value at the identified bit position. This automatic logic inversion approach is particularly well suited for use in correcting output data errors associated with via defects and weak bit defects in high-density ROM devices. | 10-23-2008 |
20080270820 | Node management device and method - A device that is communicably connected to each of three or more nodes constituting a cluster system holds resource information, which is information relating to a resource used by an application, in relation to each of the three or more nodes. The device receives resource condition information indicating variation in the condition of the resource from each node, updates the resource information on the basis of the received resource condition information, determines a following active node on the basis of the updated resource information, and notifies at least one of the three or more nodes of the determined following active node. | 10-30-2008 |
20080270821 | RECOVERING FROM ERRORS IN A DATA PROCESSING SYSTEM - A system and method of recovering from errors in a data processing system. The data processing system includes one or more processor cores coupled to one or more memory controllers. The one or more memory controllers include at least a first memory interface coupled to a first memory and at least a second memory interface coupled to a second memory. In response to determining an error has been detected in the first memory, access to the first memory via the first memory interface is inhibited. Also, the first memory interface is locally restarted without restarting the second memory interface. | 10-30-2008 |
20080282104 | Self Healing Software - The systems and methods describe a self healing framework (SHF) that can monitor errors in a computing system and can resolve the errors and/or suggest methods for resolving the errors to a user based on a heuristic approach. In addition, the SHF can analyze errors that occurred in the past and can predict such occurrences in the future to help users take proactive actions against possible errors. | 11-13-2008 |
20080288807 | SYSTEM, METHOD, AND COMPUTER PROGRAM FOR PRESENTING AND UTILIZING FOOTPRINT DATA AS A DIAGNOSTIC TOOL - A data processing system for storing and identifying footprint data in a data processing system enabling automated collection, identification and formatting recovery of footprint data executing on a mainline routine. A footprint area is allocated onto a failure recovery routine stack for use by the mainline routine for storing footprint data. The mainline routine stores footprint data within the first footprint area. The data processing system can then receive a request from a diagnostic tool, where the request includes at least one search parameter. The data processing system can output any footprint data to a diagnostic tool corresponding to the search parameters in the request. | 11-20-2008 |
20080288808 | DEBUGGING A PROCESSOR THROUGH A RESET EVENT - A method for operating a processor in data processing system comprises: asserting a debug control signal to cause the processor to enter a debug operating mode; initializing a plurality of shared processor resources with debug configuration information, wherein the plurality of shared processor resources are shared between a normal operating mode and the debug operating mode; executing instructions with the processor while in the debug operating mode; re-initializing the processor in response to a reset event; and preventing the reset event from re-initializing a predetermined portion of the debug configuration information in the plurality of shared processor resources. This allows processor debugging through reset events without losing the debug information. | 11-20-2008 |
20080288809 | MEMORY CONTROLLER FOR WRITING DATA INTO AND READING DATA FROM A MEMORY - According to an aspect of an embodiment, a memory controller for writing data into and reading data from a memory, comprises an error detector for detecting an error of data stored in the memory when reading the data, a time stamper for generating first time information indicative of the time when data is written into the memory, the first time information being written together with the data into an address location of the memory where the error has been detected, a timer for measuring a time period from the time indicated by the first time information until the time of subsequent occurrence of an error of data stored in said address location and a counter for counting a number of accesses to the address location over the time period. | 11-20-2008 |
20080294932 | INCREASING SOFTWARE FAULT TOLERANCE BY EMPLOYING SURPRISE-REMOVAL PATHS - The subject invention relates to systems and methods for automatic recovery from errors in a computing environment. A system is provided to facilitate failure recovery in the computing system. The system includes at least one driver component that enumerates at least one layer of a driver stack. A module associated with the driver component requests re-enumeration of the driver stack upon detection of an error in the computing system. When an error is detected by a driver or operating system component, a protocol can be established whereby a new copy of the driver's stack or system resources is re-enumerated in parallel to existing resources that may be in an unknown or error state. The new copy of the stack may allow the driver to become operational in lieu of the previous stack which can be reclaimed for other system uses over time. | 11-27-2008 |
20080313488 | Apparatus and Method for Diagnosing Fault and Managing Data in Satellite Ground System - Provided are an apparatus and a method for diagnosing fault and processing data of a satellite ground system. The apparatus and a method can prevent data loss of a satellite, and efficiently operate the satellite ground system using data buffer and penalty method when a temporary fault occurs. Data buffer stores data in fault situation and penalty method imposes high penalty in critical fault and low penalty in minor fault. System is managed according to penalty degree. The apparatus, includes: a satellite data processing and controlling means; a signal transforming means; a fault detecting and controlling means; a state displaying means for displaying a state of the satellite and the system; a penalty managing means for being notified whether the device has fault or not; a data storing means for storing and transmitting the data; and a system recovery supporting means. | 12-18-2008 |
20090013208 | REAL TIME AUTOMATED EXCEPTION NOTIFICATION AND REPORTING SOLUTION - A closed loop, autonomic exception notification and resolution system enables an application to proactively collect and forward exception information to developers with no user intervention; in some cases before the user is even aware that an exception has occurred. A notification process ensures that the appropriate resources can be applied to exception resolution, increasing error resolution by decreasing duplicate or misdirected efforts. An error coding scheme ensures that errors are uniquely and consistently reported while allowing duplicate issues to be grouped, further reducing duplicated efforts and improving resolution time. The use of an Exception Object which is automatically populated ensures that all information that is necessary to resolve the exception is provided to developers, thereby reducing debug time. Error resolutions are stored in a centralized database which can be accessed to quickly leverage previously generated solutions. | 01-08-2009 |
20090013209 | APPARATUS FOR CONNECTION MANAGEMENT AND THE METHOD THEREFOR - An apparatus and method for scheduling data distributions to or results information from, or collectively, “jobs” a plurality of data processing systems via a network. A connection to a target system is created. For each distribution, a session, which is an independent thread, is allocated from one of a plurality of pool of sessions and launched to effect execution of the job. Each pool corresponds to a predetermined priority level, and the session is allocated from the pool having the same priority level as the priority level of the job being scheduled. A connection supports a multiplicity of independent threads. In the event of an error, the session is released, and the scheduling of the aborted job is retried after a predetermined retry interval expires. After expiry of the retry interval, a callback method is invoked when the target system on which the scheduled job is executed becomes accessible. | 01-08-2009 |
20090019305 | MARKET DATA RECOVERY - Networks, systems and methods for recovering data messages from a market data stream and for building a book for a financial instrument are disclosed. An out-of-band data stream related to an as-of state of the market for one or more financial instruments is distributed parallel to a stream of market data for the financial instrument. The as-of data stream is referenced to the financial according to a unique identifier of the messages of the market data stream. The as-of data for a financial instrument may be provided at periodic rate that may be varied according to one or more factors. | 01-15-2009 |
20090019306 | Protecting tag information in a multi-level cache hierarchy - In one embodiment, the present invention includes a shared cache memory that is inclusive with other cache memories coupled to it. The shared cache memory includes error correction logic to correct an error present in a tag array of one of the other cache memories and to provide corrected tag information to replace a tag entry in the tag array including the error. Other embodiments are described and claimed. | 01-15-2009 |
20090031161 | Method, operating system and computing hardware for running a computer program - A method for executing a computer program on computing hardware, e.g., on a microprocessor, is provided, the computer program including multiple program objects and errors being detected in the method while running the computer program on the computing hardware. The program objects are subdivided into at least two classes, and multiple program objects are executed during one run, program objects of the first class being repeated when an error is detected and, when an error is detected in one program object of the first class, which has already been sent for execution, this program object of the first class is restarted instead of a program object of the second class after the other program objects of the first class of a run. | 01-29-2009 |
20090031162 | APPARATUS AND METHOD FOR REPAIRING COMPUTER SYSTEM INFECTED BY MALWARE - An apparatus and method of diagnosing whether a program executed in a computer system is malware and repairing the computer system infected by malware. The apparatus includes a receiving unit which receives a first behavior vector for the malware from a malware control server; a determination unit which determines whether a diagnostic target program corresponds to malware based on the received first behavior vector and a second behavior vector for the diagnostic target program; and a repair unit which repairs the computer system based on a result of the determination. | 01-29-2009 |
20090037762 | Electronic document presentment services in the event of a disaster - The disaster recovery techniques, for presentment of a company's bills, statements or the like, provide electronic document presentment in the event of a disaster that impacts the company's print mail delivery operation or other existing mailing system(s). Files containing electronic documents are received, from a system associated with the print mail delivery operation, and the documents are stored in a database. Preferably, the systems use the company's existing data files. The files may be converted to a format compatible with one or more electronic delivery methodologies, if necessary. The disaster recovery systems present notice and/or data from the documents to the company's customers electronically, for example as e-mail (notice or message containing some or all of the document data), as a document attachment to an e-mail, via a web site, and possibly via telephone voice announcement. | 02-05-2009 |
20090044040 | MODIFICATION OF ARRAY ACCESS CHECKING IN AIX - An error handling operation for checking of an array access in program code is modified during compilation thereof. A sequentially arranged null checking operation and array bounds checking operation for the array access are located. The array bounds checking operation has a corresponding error handling operation operable for setting an array bounds error. The located sequentially arranged null checking operation is removed. The corresponding error handling operation for the located sequentially arranged array bounds checking operation is modified to perform the removed null checking operation during execution of the program code. | 02-12-2009 |
20090063891 | System and Method for Providing Reliability of Communication Between Supernodes of a Multi-Tiered Full-Graph Interconnect Architecture - A method, computer program product, and system are provided for providing reliability of communication. A first processor determines a current state of links coupled to ports of a first processor of the data processing system. Each port of the first processor comprises a plurality of links to a corresponding port on a second processor of the data processing system. The current state of the links indicates a level of error associated with each link. The first processor determines, for each link, if a level of error associated with the link exceeds a threshold. For each link whose level of error exceeds the threshold, the first processor tags the link with an error identifier in a switch associated with the ports of the first processor. The first processor reduces a level of usage for transmitting data on ports associated with links tagged with the error identifier. | 03-05-2009 |
20090070620 | Method and system for detecting and recovering failure command - A method and a system for detecting and recovering a failure command are provided. The method is used in a native command queuing (NCQ) and at least includes the following steps. In step (a), several commands are executed on a disk simultaneously according to the NCQ. In step (b), whether a request time of the commands is longer than a waiting time is measured: if the request time is not longer than the waiting time, then step (a) is executed. In step (c), one command is chosen. In step (d), whether the chosen command is executed successfully is determined: if the chosen command is executed successfully, then step (f) is executed. In step (e), the chosen command is recovered. In step (f), whether all the commands of the NCQ are chosen is determined: if not all the commands are chosen, then another command is chosen and step (d) is repeated. | 03-12-2009 |
20090083572 | PROGRAM CONTROL METHOD FOR NETWORK DEVICES AND NETWORK SYSTEM - The program control of network devices, each of which provides services according to the present invention, monitors a function of a program module operating in each of the network devices. If the function of the program module has a problem, the program control performs proxy response processing of the network device by a virtual device program until the problem is corrected. In addition, if the function of the program module has a problem, the program module operating in the network device is updated to the latest program module to correct the problem. Then, the operation of the virtual device program is stopped. | 03-26-2009 |
20090094477 | SYSTEM AND PROGRAM PRODUCT FOR DETECTING AN OPERATIONAL RISK OF A NODE - Under the present invention, the performances of a plurality of similarly configured nodes are monitored and compared. If one of the nodes exhibits a performance that varies from the performances of the other nodes by more than a current tolerance, an operational risk is detected. If detected, an alert can be generated and one or more corrective actions implemented to address the operational risk. | 04-09-2009 |
20090100287 | Monitoring Apparatus and a Monitoring Method Thereof - A monitoring apparatus and a monitoring method thereof are disclosed. The monitoring apparatus is used to monitor a computer. The monitoring apparatus comprises a control unit, and a first non-volatile memory unit. If the computer has an abnormal operation before loading an operating system, the control unit is used to store an error code according to the abnormal operation in the first non-volatile memory unit and execute a recovery process according to the error code. | 04-16-2009 |
20090106578 | Repair Planning Engine for Data Corruptions - A computer is programmed to automatically generate repairs to fix failure(s) while taking into account dependencies between repairs relative to one another, by grouping failures. In some embodiments, the computer uses a map that associates each failure type with repair types that are alternatives to one another, and uses another map that associates each repair type with a template that creates the repair when instantiated. In certain embodiments, repairs within a repair plan are consolidated, to avoid duplicates and redundancies. | 04-23-2009 |
20090113232 | APPARATUS AND METHOD FOR MANAGING WIRELESS SENSOR NETWORK - An apparatus for managing a plurality of wireless sensor networks selects a configuring policy according to a characteristic of each wireless sensor network and configures a network with the selected configuring policy for management. When an error occurs in the wireless sensor network, the apparatus performs error diagnosis based on a configuring policy applied to the error-detected wireless sensor network, infers a cause of the error, and provides an error recovery method corresponding to the inferred cause of the error to the error-detected wireless sensor network. | 04-30-2009 |
20090125751 | System and Method for Correlated Analysis of Data Recovery Readiness for Data Assets - A method, system, and computer program product are provided for determining the recovery readiness of a data asset. A set of metrics are identified for a current recovery operation performed for the data asset and a current recovery objective is identified for the data asset. The current recovery operation is applied to the data asset using the set of metrics. A determination is made as to whether the current recovery operation meets the recovery objective for the data asset. Responsive to a failure of the current recovery operation to meet the recovery objective, an error is presented indicating the failure and a determination is made as to whether a different recovery policy may be implemented to meet the recovery objective for the data asset. If a different recovery policy exists that meets the recovery objective for the data asset, the different recovery policy is implemented. | 05-14-2009 |
20090132848 | PARALLEL PROGRAMMING ERROR CONSTRUCTS - A system receives a program, allocates the program to a first software unit of execution (UE) and a second software UE, executes a first portion of the program with the first and second software UEs in parallel, and determines whether an error is detected during execution of the first portion of the program by the first and second software UEs. The system also sends a signal, between the first and second software UEs, to execute a second portion of the program when the error is detected in the first portion of the program, executes the second portion of the program with the first and second software UEs when the error is detected, and provides for display information associated with execution of the first portion and the second portion of the program by the first and second software UEs. | 05-21-2009 |
20090150713 | Multi-voltage synchronous systems - Embodiments include a system, a device, and a method. A computing system includes a synchronous circuit. The synchronous circuit includes a first subcircuit powered by a first power plane having a first power plane voltage and a second subcircuit powered by a second power plane having a second power plane voltage. The system also includes an error detector operable to detect an incidence of a computational error occurring in the first subcircuit. The system further includes a controller operable to change the first power plane voltage based upon the detected incidence of a computational error. The system may include a power supply operable to provide a selected one of at least two voltages to the first power plane in response to the controller. | 06-11-2009 |
20090150714 | REMOTE DIAGNOSTIC AND REPAIR SYSTEM - A system and method for remotely diagnosing and repairing a computer controlled asset comprises an access point connected to a computer controlled asset thereby allowing electronic access to the computer system of the computer controlled asset, a service center remotely connected to the access point for providing diagnostic review and repair of the computer controlled asset, and an interface linking the access point to the service center thereby allowing the service center to communicate with the computer controlled asset via the access point. | 06-11-2009 |
20090158079 | Fault information processing system and method for vehicle - The present invention relates to a fault information processing system and method for a vehicle, which can satisfy a short control cycle to thereby reduce the burden applied to the CPU and enables significant fault information (freeze frame) to be frozen. To this end, this invention features that the fault detection unit, the fault processing unit, the fault management unit having independent control cycles process all the faults occurred depending on a priority in such a fashion that fault-related data (freeze frame) is frozen immediately after the occurrence of a fault irrespective of the type of the occurred fault and the priority. Also, the fault management unit retrieves the occurred fault at an independent control cycle, combines the previously frozen fault-related data and the occurred fault, and stores corresponding fault information in a buffer unit. | 06-18-2009 |
20090158080 | STORAGE DEVICE AND DATA BACKUP METHOD - A storage device includes: a storage unit for storing data; a memory for storing management information; a local storage unit for storing differential data; a controller for controlling the storage device in accordance with a process comprising the steps of: updating data; updating management information; transmitting differential data to the another storage device, the differential data being the updated portions of the data which have been updated after preceding backing up of data until current backing up of data; resetting the management information after transmitting the differential data; storing, when the storage device fails transmission of the differential data to the another storage device, the differential data and the associated management information in the local storage unit; and transmitting the differential data to the another storage device at a later time after resetting of the management information. | 06-18-2009 |
20090164833 | Methods and Systems for Automated Processing of Fallout Orders - A system and method may include receiving an order and an error identifier, indexing a database based on the error identifier to identify a rule identifier, and indexing the database based on the rule identifier to identify a rule. The system and method may further include applying the rule to modify the order to generate a modified order, and submitting the modified order for processing. | 06-25-2009 |
20090164834 | SOFT ERROR RECOVERABLE STORAGE ELEMENT AND SOFT ERROR PROTECTION TECHNIQUE - A soft error recoverable storage element suitable for use in latches, flip-flops, static ram memory cells and microprocessor pipeline stages. The storage element employs a redundant copy of the stored data value and a feedback loop. One embodiment employs an interlocking four inverter loop with gating devices that blocks the propagation of a soft error induced change of state and causes the storage element to recover its original stored data state. | 06-25-2009 |
20090172460 | DEFINING A COMPUTER RECOVERY PROCESS THAT MATCHES THE SCOPE OF OUTAGE - Recovery processing is defined that matches the scope of an outage. A programmatic analysis of the resources that have been impacted, of implications of the failure and what degradations have occurred is performed to construct an appropriate level of recovery. This includes selecting the appropriate set of resources to be recovered. Recovery operations are selected based on the current state of the environment. | 07-02-2009 |
20090172461 | CONDITIONAL ACTIONS BASED ON RUNTIME CONDITIONS OF A COMPUTER SYSTEM ENVIRONMENT - Conditionally performing delegated actions based on runtime conditions of the environment. A component of an Information Technology environment conditionally performs an action, such as its own recovery, based on whether the component can have such action delegated to it and/or whether that component is currently being shared by multiple business applications of the environment. | 07-02-2009 |
20090172462 | METHOD AND SYSTEM FOR RECOVERY OF A COMPUTING ENVIRONMENT - A method and system for recovery of a computing environment includes monitoring during a pre-boot phase and a runtime phase of a computing device for selection of a hot key sequence by a user and performing a recovery action in response to the selection of the hot key sequence by the user. The recovery action may be any one of a number of predetermined and/or selectable actions such as restoring system defaults, migrating memory, displaying a menu of options, setting various software flags, restarting or rebooting the computing device, and/or the like. | 07-02-2009 |
20090177910 | METHOD OF RECOVERING FROM SOFTWARE FAILURES USING REPLANNING - A method for recovering from software failures, includes: receiving failure information that identifies a failing component of a first processing graph; modifying a planning domain that includes a plurality of component descriptions according to the failure information; and composing a second processing graph by using the modified planning domain so that the second processing graph does not include the failing component. | 07-09-2009 |
20090193286 | Method and System for In-doubt Resolution in Transaction Processing - A method and system are provided for in-doubt resolution in transaction processing involving at least two distributed transaction processing systems. The method includes an initial exchange of information to establish an identifier for coordinating units of recovery in distributed transaction processing systems. The method includes a first transaction processing system creating a local unit of recovery and sending a request to a second transaction processing system to create a coordinating unit of recovery, the request including an identifier of the local unit of recovery. The second transaction processing system starts a coordinating unit of recovery and recording the identifier in association with the coordinating unit of recovery. In the event of a failure, one of the first and second transaction processing systems uses the identifier to locate the unit of recovery on the other of the first and second transaction processing systems to resynchronize the units of recovery. | 07-30-2009 |
20090199039 | FILE DATA RESTORING SYSTEM AND METHOD OF COMPUTER OPERATING SYSTEM AND SOFTWARE THEREOF - A file data restoring system and method of a computer operating system and software thereof are applied in the installation of an operating system into a client computer. Divide the file data corresponding to the operating system into data blocks according to an appointed data size. Generate a check code for each of the data blocks to form a sequence list of original check codes and a sequence list of target check codes. Compare the sequence list of original check codes with the sequence list of target check codes, after installing the operating system into the computer. If the comparison result is inconsistent, a restoring call information is sent out. The position of the inconsistent check code is acquired through the restoring call information and the comparison result. The original file data corresponding to the position of the check code is read and restored to a corresponding target file. | 08-06-2009 |
20090204844 | ERROR-TOLERANT PROCESSOR SYSTEM - A processor system includes at least one execution unit for executing program instructions of an application, a program memory for storing the program instructions of the application and at least one error handling routine, a main memory for storing a set of variables of the application and a monitoring unit for detecting errors of the execution unit and/or of the main memory, and the starting of an error handling routines in case an error is detected. The error handling routines are designed in each case to refresh different subsets of the set of variables. | 08-13-2009 |
20090204845 | COMMUNICATION DEVICE AND A METHOD OF SELF-HEALING THEREOF - Provided a communication device and a method of self-healing thereof. The communication device is characterized by one or more operational functions and comprising one or more resources operatively coupled to at least one sensor, said sensor directly or indirectly coupled to a recovery block, wherein the device is configured to held an emergency configuration related to at least one of said operational functions and/or at least one of said resources; the sensor is configured to monitor at least one of said resources for information, indicative of at least one possibly malfunctioned resource, and to report, directly or indirectly, this information and/or derivative thereof to the recovery block; and the recovery block is configured to initiate at least one remedial action in respect of at least one of said resources in case the received information and/or derivative thereof meets a certain criterion, wherein said remedial action to be provided in accordance with the emergency configuration. | 08-13-2009 |
20090217077 | METHOD, SYSTEM, AND COMPUTER PROGRAM PRODUCT FOR PROCESSOR ERROR CHECKING - A method for processor error checking including receiving an instruction data, generating a pre-processing parity data based on the instruction data, maintaining the pre-processing parity data, processing the instruction data, generating a post-processing parity data based on the processed instruction data, checking for an error related to processing the instruction data by comparing the post-processing parity data to the pre-processing parity data, and transmitting an error signal that indicates the error related to processing the instruction data occurred if the post-processing parity data does not match the pre-processing parity data, wherein checking for the error related to processing the instruction data is performed without using a duplicate processing circuitry. | 08-27-2009 |
20090217078 | APPARATUS AND METHODS FOR MANAGING MALFUNCTIONS ON A WIRELESS DEVICE - Apparatus and methods for managing predetermined malfunction events in a wireless device operating in a wireless communications network. Malfunction event data and operational data are recorded by the wireless device based on a selected malfunction event tracking configuration. Further, a recovery module associated with the wireless device operates to attempt to recover information leading up to and including the malfunction event. The collected information may be transmitted to a user manager in the form of a malfunction event log. The malfunction event log may be analyzed to characterize the malfunction, and is particularly useful for determining the sequence and identity of events leading to the malfunction, including a crash, freeze and reset. | 08-27-2009 |
20090249111 | Raid Error Recovery Logic - A method of reading desired data from drives in a RAID1 data storage system, by determining a starting address of the desired data, designating the starting address as a begin read address, designating one of the drives in the data storage system as the current drive, and iteratively repeating the following steps until all of the desired data has been copied to a buffer: (1) reading the desired data from the current drive starting at the begin read address and copying the desired data from the current drive into the buffer until an error is encountered, which error indicates corrupted data, (2) determining an error address of the error, (3) designating the error address as the begin read address, and (4) designating another of the drives in the data storage system as the current drive. | 10-01-2009 |
20090249112 | TRIGGERED RESTART MECHANISM FOR FAILURE RECOVERY IN POWER OVER ETHERNET - A triggered restart mechanism for failure recovery in power over Ethernet (PoE). Powered devices (PDs) that fail can be remotely recycled by a power sourcing equipment (PSE). After detection of a failure of a PD, such as by the failure to receive a status message, a PSE can generate a reset signal (e.g., power cycle, reset pulse, etc.) on the port. This reset signal can cause the PD to perform a full power cycle or quick restart. | 10-01-2009 |
20090249113 | METHOD FOR RECOVERING BASIC INPUT OUTPUT SYSTEM AND COMPUTER DEVICE THEREOF - The invention discloses a method for recovering a basic input output system (BIOS) and a computer device thereof. The computer device of the invention includes a motherboard, a power button, a BIOS storage unit, and an embedded controller. The BIOS storage unit is disposed on the motherboard, and it stores a first boot block code and a second boot block code. When the computer device is connected with a power supply to supply standby power to the motherboard, and the power button is not pressed, the embedded controller detects whether the first boot block code is damaged. If the first boot block code is damaged, the embedded controller recovers the first boot block code via the second boot block code. | 10-01-2009 |
20090254773 | Method, operating system and computing hardware for running a computer program - A method for running a computer program on computing hardware, in particular on a microprocessor, is described, the computer program including multiple program objects and errors being detected in the method while running the computer program on the computing hardware, the program objects being subdivided into at least two classes and program objects of the first class being repeated when an error is detected and, when an error is detected in one program object of the first class, which has already been sent for execution, this program object of the first class being restarted instead of a program object of the second class. | 10-08-2009 |
20090254774 | METHODS AND SYSTEMS FOR RUN-TIME SCHEDULING DATABASE OPERATIONS THAT ARE EXECUTED IN HARDWARE - Embodiments of the present invention provide a run-time scheduler that schedules tasks for database queries on one or more execution resources in a dataflow fashion. In some embodiments, the run-time scheduler may comprise a task manager, a memory manager, and hardware resource manager. When a query is received by a host database management system, a query plan is created for that query. The query plan splits a query into various fragments. These fragments are further compiled into a directed acyclic graph of tasks. Unlike conventional scheduling, the dependency arc in the directed acyclic graph is based on page resources. Tasks may comprise machine code that may be executed by hardware to perform portions of the query. These tasks may also be performed in software or relate to I/O. | 10-08-2009 |
20090259879 | DETERMINING CORRECTIVE ACTIONS USING A GEOMETRICALLY-BASED DETERMINATION OF SUFFICIENT CONFIDENCE - A method, system, and apparatus for determining a corrective action for a diagnosable system are provided. A failure mode reasoning engine (FMRE) receives an evidence notification. The FMRE determines a plurality of evidentiary-failure-mode-probability rectangles (EFMPRs) based on the evidence notification. A candidate EFMPR in the plurality of EFMPRs is determined. The candidate EFMPR may be determined based on a distance from an origin of an evidentiary-failure-mode-probability graph. An overlap area is determined between the candidate EFMPR and the other EFMPRs in the plurality of EFMPRs. The overlap area is compared to an overlap threshold. If the overlap area is less than the overlap threshold, a reasoned failure mode (i.e., correct diagnosis) and/or a reasoned corrective action for the diagnosable system is determined, based on the candidate EFMPR. The FMRE may report and/or take the reasoned corrective action. | 10-15-2009 |
20090259880 | PROCESS FLOW EXECUTION APPARATUS, CONTROL METHOD THEREOF, AND STORAGE MEDIUM STORING CONTROL PROGRAM THEREFOR - A process flow execution apparatus capable of notifying a user of a task or a setting that is not supported in an application of a version lower than a version set in a process flow that describes process contents of a plurality of tasks. A process flow application handling the process flow is installed in the apparatus that can communicate with another apparatus capable of handling the process flow application, via a network. An acceptance unit accepts an instruction to execute the plurality of tasks based on the process flow. A control unit controls to display a warning screen on a display unit if a version of a process flow application that generates the process flow for which the acceptance unit has accepted the execution instruction is higher than a version of the process flow application installed in the process flow execution apparatus. | 10-15-2009 |
20090265576 | SYSTEM FOR DETERMINING REAL TIME NETWORK UP TIME - The inventive system and method for determining the availability of a computer network comprises a device operable to connect to at least the computer network using internet communications and using GSM, and an alarm service and/or a central server, wherein the device attempts to connect to the computer network using the internet communications and if the device fails to connect within a predetermined value, such as an amount of time or a number of tries, the device uses the GSM to notify the alarm service of the failure to connect. In one embodiment, after the device notifies the alarm service and/or central server of the failure to connect, the device continues to attempt to connect to the computer network, and if the device connects within another predetermined value, the device notifies the alarm service and/or central server of the restoral of service. | 10-22-2009 |
20090276655 | Method for detecting errors during initialization of an electronic appliance and apparatus therefor - The invention concerns a method for detecting problems arising during the launching phase of a resident software of an electronic appliance to be detected. Said detection is carried out by means of data written in the non-volatile memory during said phase. Said data are then erased in case of success. In case of failure, it is then possible, upon the next restart, to use said data to detect the problem. | 11-05-2009 |
20090282281 | Low power, high reliability specific compound functional units - To prevent short path errors from occurring in systems having error detection and recovery mechanisms, functional elements are combined to form compound functional units comprising at least two evaluation stages, each evaluation stage including at least one functional element. At least one functional element includes error detection/recovery circuitry. The flow of input values to the first evaluation stage in the compound functional unit is controlled so that the input values are changed at most every second clock cycle. | 11-12-2009 |
20090282282 | STORAGE APPARATUS, MEDIUM CONTAINING RETRY PROGRAM, AND RETRY METHOD - The storage apparatus includes a detection unit (MPU | 11-12-2009 |
20090287950 | ERROR MANAGEMENT FRAMEWORK - A system and method for computer error management initially detects an error and creates an error report. The error report is then sent to a server where it is hosted on the World Wide Web. The server additionally notifies designated users of the error and can allow the users to correct the error. | 11-19-2009 |
20090287951 | Network device and method of operating the same - A network device includes multiple ports for accepting network connections, at least one memory for storing a fail-safe device configuration, a normal operational device configuration and one or more triggering events, and one or more processors connected to the memory for controlling the operation of the network device. The processor causes the network device to operate according to the fail-safe configuration in response to the occurrence of a triggering event. | 11-19-2009 |
20090287952 | Backplane Interface Adapter with Error Control and Redundant Fabric - A backplane interface adapter with error control and redundant fabric for a high-performance network switch. The error control may be provided by an administrative module that includes a level monitor, a stripe synchronization error detector, a flow controller, and a control character presence tracker. The redundant fabric transceiver of the backplane interface adapter improves the adapter's ability to properly and consistently receive narrow input cells carrying packets of data and output wide striped cells to a switching fabric. | 11-19-2009 |
20090292941 | PROOF-GUIDED ERROR DIAGNOSIS (PED) BY TRIANGULATION OF PROGRAM ERROR CAUSES - Systems and methods are disclosed for performing error diagnosis of software errors in a program by from one or more error traces, building a repair program containing one or more modified program semantics corresponding to fixes to observed errors; encoding the repair program with constraints, biases and priortization into a constraint weighted problem; and solving the constraint weighted problem to generate one or more repair solutions, wherein the encoding includes at least one of: a) constraining one or more repairs choices guided by automatically inferring one or more partial specifications of intended program behaviors and program structure; b) biasing one or more repair choices guided by typical programming mistakes; and c) prioritizing the repair solutions based on error locations and possible changes in program semantics. | 11-26-2009 |
20090300403 | FINE GRAINED FAILURE DETECTION IN DISTRIBUTED COMPUTING - A client sends a request message to a process hosted by a remote server via a middleware service, wherein the request message specifies a procedure for the process to execute. The client waits a predetermined time period to receive a response message from the process. If no response message is received within the predetermined time period, the client probes the process to determine why no response message has been received, wherein said probing reveals thread level information about the process. | 12-03-2009 |
20090313496 | Computer implemented systems and methods for pre-emptive service and improved use of service resources - Systems and methods are provided for collecting, aggregating, and analyzing data associated with the installation and deployment of systems. Energy systems, ( | 12-17-2009 |
20090327796 | SERVICE ORIENTED ARCHITECTURE BASED DECISION SUPPORT SYSTEM - A service oriented architecture (SOA) based decision support system for a vehicle is provided. A database is provided for storing a workplan of the vehicle. A webservice provider is in communication with the database for integrating applications using a variety of open standards of an internet protocol backbone. A core is connected between the database and webservice provider. The core is adapted for analyzing fault conditions, creating the workplan to overcome the fault conditions, and generating a response for a webservice request. An enterprise service bus is connected to the webservice provider for providing loose connectivity between webservice enabled functions, a service customer, and the core. | 12-31-2009 |
20090327797 | Method and Provider Edge Device for Advertising and Processing Pseudo-Wire Information - The present invention discloses a method for advertising and processing pseudo-wire (PW) information, which comprises: the sending provider edge (PE) device using two or more methods to group PWs, identifying the group identifier assigned to each PW with each grouping method, and sending all group identifiers of each PW to the receiving PE device; the sending PE device sending to the receiving PE device the notification message that carries information identifying the affected PW group, and the receiving PE device identifying the PWs belonging to the affected PW group according to the received notification. The present invention also discloses the sending and receiving PE devices for advertising and processing PW information. The method and the devices of the present invention can support grouping PWs with more than one method, allowing for flexible use of PW group-based messaging and message processing. | 12-31-2009 |
20100017642 | Distributed Transaction Processing System Having Resource Managers That Collaborate To Decide Whether To Commit Or Abort A Transaction In Response To Failure Of A Transaction Manager - A distributed transaction processing system includes a plurality of resources, resource managers to manage corresponding ones of the resources, and a transaction manager to coordinate performance of a transaction with the resource managers. In response to failure of the transaction manager, the resource managers are configured to collaborate to decide whether to commit or abort the transaction. | 01-21-2010 |
20100023797 | SEQUENCING TECHNIQUE TO ACCOUNT FOR A CLOCK ERROR IN A BACKUP SYSTEM - A method, apparatus, and system of a sequencing technique to account for a clock error in a storage area network are disclosed. In one embodiment, a system of a backup server includes a processing module to examine a data timestamped with a sequence of characters denoting a time according to a clock source, an analysis module to determine that the data has been timestamped at an earlier time than an other data previously received, a substitution module to provide the data an incremental sequence number placed with the data using an algorithm until a new data is received that includes a future timestamp with a later timestamp than the timestamp of the other data, and a storage module to store the data. | 01-28-2010 |
20100023798 | ERROR RECOVERY AND DIAGNOSIS FOR PUSHDOWN AUTOMATA - Error recovery and diagnosis is afforded for pushdown automata. Upon detection of an error, a recovery strategy is selected and dispatched to recover from the error to place an automaton in an error free state to enable continued processing. In one instance, recovery strategies can be specified and matched with respect to automaton configuration. Errors can be diagnosed as a function of the difference between a first error configuration and a second recovered configuration. | 01-28-2010 |
20100023799 | DETERMINING CORRECTNESS OF JOB PLANS IN A STREAM PROCESSING APPLICATION - Embodiments of the invention provide techniques for determining the correctness of similar job plan segments in a stream processing application. In one embodiment, a job manager may be configured to identify similar job plan segments based on data formats, functionality, and surrounding processing elements. The job manager plan may be further configured to determine whether the similar segments provide inconsistent results, and if so, to determine which of the inconsistent similar segments is invalid. The job manager may identify an invalid processing element included in the invalid segment. The job manager may also perform corrective actions to address the invalid processing element. | 01-28-2010 |
20100023800 | NAND Flash Memory Controller Exporting a NAND Interface - A NAND controller for interfacing between a host device and a flash memory device (e.g., a NAND flash memory device) fabricated on a flash die is disclosed. In some embodiments, the presently disclosed NAND controller includes electronic circuitry fabricated on a controller die, the controller die being distinct from the flash die, a first interface (e.g. a host-type interface, for example, a NAND interface) for interfacing between the electronic circuitry and the flash memory device, and a second interface (e.g. a flash-type interface) for interfacing between the controller and the host device, wherein the second interface is a NAND interface. According to some embodiments, the first interface is an inter-die interface. According to some embodiments, the first interface is a NAND interface. Systems including the presently disclosed NAND controller are also disclosed. Methods for assembling the aforementioned systems, and for reading and writing data using NAND controllers are also disclosed. | 01-28-2010 |
20100037085 | SYSTEMS AND METHODS FOR BULK RELEASE OF RESOURCES ASSOCIATED WITH NODE FAILURE - Systems and methods according to these exemplary embodiments provide for methods and systems for improving efficiency in communications systems by, for example, bulk release of resources upon a partial node failure. Bulk release messages including, for example, at least one identifier associated with a plurality of resources, can be transmitted from a node toward other nodes to release such resources after the node failure. | 02-11-2010 |
20100037086 | ROBUST CRITICAL SECTION DESIGN IN MULTITHREADED APPLICATIONS - A multithreaded computer application provides more robust mutually exclusive accesses as instantiations (threads) of a single program, such that deadlock situations are avoided. The application method uses the system primitives to implement system services that provide a ‘gate’ functionality (S | 02-11-2010 |
20100042867 | CUSTOMIZATION AND REUSE OF LOGGED AGENT ACTIVITY DURING REMOTE TAKE OVER HELP SESSION - An apparatus and a method is provided for verifying the setup of a current computer in need of repair, parsing and updating a previously created activity log file to provide usable steps that refer to the appropriate configuration settings for the computer being repaired. The method includes obtaining an indication of a problem on a remote computer, reviewing stored log files to determine if an old log file associated with the problem exists and if the old log file does not exist, creating a new log file to store steps associated with repair operations. The method also includes if the old log file exists, retrieving the old log file and obtaining the remote computer configurations, parsing the old log file to identify references to configuration settings and comparing the configuration settings in the old log file with the remote computer configuration. The method further includes updating at least one configuration setting of the old log file to reflect the remote computer configuration, if the at least one configuration setting in the old log file is different from the remote computer configuration and executing steps in the old log file to solve the problem on the remote computer. | 02-18-2010 |
20100042868 | SYSTEM AND METHOD FOR PROVIDING DATA SERVICES VIA A NETWORK - A method and system are provided for performing an activity. Accordingly, an activity to be performed is determined, a stored hierarchy is examined indicating a first alternate component for performing the activity first and a second alternate component for performing the activity if the first alternate component fails. The first alternate component is invoked to perform the activity, and when a failure of the first alternate component to perform the activity is detected, the second alternate component is invoked to perform the activity. A revised hierarchy is stored indicating that the second alternate component is to be invoked to perform the activity before the first alternate component is invoked to perform the activity. | 02-18-2010 |
20100058105 | Environment Wide Configuration System - An installation and configuration system consolidates workloads of multiple applications and services, including applications or services that may be installed or configured on multiple server devices or remotely hosted services. The system gathers environmental information, analyzes dependencies among the workloads, and populates the input data used by the workloads from a common database. The system then executes the workloads, allowing branching within the workloads or the sequence of workloads. An example of branching may include detecting an error condition, pausing the sequence, and presenting alternative fixes to a user. | 03-04-2010 |
20100058106 | Virtual machine file system and incremental snapshot using image deltas - Methods and apparatus involve file systems for virtual machines and image deltas. Representatively, a plurality of virtual machines are configured on a hardware platform and a file system includes both a read-only portion and a writable portion that together provides the entire file system for each virtual machine. Also, a union of the two portions provides an incremental snapshot of its corresponding virtual machine and can be used to restore the virtual machine upon a failure event. In content, the read-only portion contains substantially immutable information such as core basic system image, while the writable portion contains configuration information, state data and production information. An available storage device for the virtual machines is partitioned for each virtual machine and its corresponding writable portion is found therein. Other features contemplate particular configurations and computer program products, to name a few. | 03-04-2010 |
20100058107 | Error recovery within processing stages of an integrated circuit - An integrated circuit includes a plurality of processing stages each including processing logic | 03-04-2010 |
20100070794 | METHOD OF NOTIFYING STATUS INFORMATION AND IMAGE FORMING APPARATUS USING THE SAME - A method of notifying status information and an image forming apparatus using the same. The method of notifying status information includes setting a display attribute of status information of the image forming apparatus, if status information of the image forming apparatus is updated, generating status notification information in a first format or a second format according to whether a display attribute is set for the updated status information, and transmitting the generated status notification information. Accordingly, a user is notified of error occurrence in a manner as he/she wishes. | 03-18-2010 |
20100070795 | SUPPORTING APPARATUS AND SUPPORTING METHOD - A supporting apparatus includes a configuration-information storage unit having stored therein dependencies among devices in association with a list of the devices. When information about a device where a failure has occurred is input, a dependency among devices including the faulty device is obtained from the configuration-information storage unit. Based on the obtained dependency among devices including the faulty device and information about the faulty device, learning data with the dependency and a cause of failure being associated with each other is created. Then, based on the created learning data, a solution procedure indicative of a procedure for specifying the cause of failure is generated by using, for example, algorithm ID3. | 03-18-2010 |
20100083029 | Self-Optimizing Algorithm for Real-Time Problem Resolution Using Historical Data - A self-optimizing algorithm for real-time problem resolution using historical data. Upon receiving failure symptom characteristics for a product or process failure, the algorithm queries historical failure data to locate historical failure symptoms and corrective actions matching the failure symptom characteristics. If a total number of the historical corrective actions identified meets a minimum match threshold, the algorithm selectively prunes a failure symptom characteristic having the lowest priority level to form an adjusted search query. The algorithm may repeat the querying, identifying, and determining steps using the adjusted search query until the total number of historical corrective actions identified meets the minimum match threshold. Once the threshold is met, the algorithm sorts the historical corrective actions to form a list of recommended corrective actions for the failure symptom characteristics and provides the list of recommended corrective actions to an end user. | 04-01-2010 |
20100115325 | METHOD FOR ACCESSING A FLASH MEMORY, AND ASSOCIATED MEMORY DEVICE AND CONTROLLER THEREOF - A method for accessing a Flash memory including a plurality of blocks includes: selectively programming a page in a first block of the blocks; when a status of the Flash memory is abnormal, determining whether a number of error bits is less than a predetermined value; and when the number of error bits is not less than the predetermined value, moving the first block. An associated memory device and a controller thereof are also provided, where the controller includes: a read only memory (ROM) arranged to store a program code; and a microprocessor arranged to execute the program code to control the access to the Flash memory. In addition, when the number of error bits is not less than the predetermined value, the controller that executes the program code by utilizing the microprocessor moves the first block. | 05-06-2010 |
20100122110 | METHOD AND APPARATUS FOR MANAGING ADVERTISING-ENABLED APPLICATIONS - A method comprises determining a condition of a resource for a user device, the device having an advertisement application thereon; and sending a notification indicative of the condition of the resource to the advertisement application. | 05-13-2010 |
20100131792 | ANALYSIS LEADING TO AUTOMATIC ACTION - A system and methods for detecting an operational problem on a mobile device and automatically resolving the problem with a coordinated execution of repair tasks on the mobile device and on the plurality of servers communicatively connected to the mobile device providing for a complex solution to an operational problem. The system can archive a history of corrective actions and their outcome for later analysis and reporting. | 05-27-2010 |
20100138685 | Real-Time Signal Handling In Guest And Host Operating Systems - The present invention relates to signal handling in a host operating system process executing code portions of a single- or multi-threaded application and of the embedded guest operating system. When a signal is sent from the host operating system to the operating system process, the signal handler of the guest operating system will be invoked in a deterministic time, independent of the operating system process executing code portions of the application or of the guest operating system or executing system calls of the host operating system in a masked or non-masked operation. | 06-03-2010 |
20100146323 | METHOD, SYSTEM AND APPARATUS FOR DETECTING MULTIPLE ACTIVE DEVICES IN STACK - Embodiments of the present invention disclose a method for detecting multiple Active devices in a stack. In the method, a new Active device generates a Link Aggregation Control Protocol (LACP) packet when a stacking link is in failure, the LACP packet contains a bridge MAC address and a member number of the new Active device; the new Active device transmits the LACP packet to a stacking member through an access switch; and the stacking member detects collision of multiple Active devices according to the MAC address and the member number and enters a Recovery state. In the present invention, by containing the bridge MAC, the Active ID and the configuration dividing identifier in the extended field of the LACP packet, the collision of multiple Active devices in a distributed stack can be detected. | 06-10-2010 |
20100146324 | Method and apparatus for fault detection/isolation in metro ethernet service - Apparatus and method of detecting a fault in a network service includes an Ethernet access network domain in which a heartbeat message is broadcast at a periodic interval by each of a plurality of edge devices associated with an instance of the network service. Each of the edge devices also receives the heartbeat messages broadcast at the periodic interval from other edge devices. A fault occurrence is identified when the edge device fails to receive an expected heartbeat message at the periodic interval from one of the other edge devices. | 06-10-2010 |
20100153767 | Small computer system interface input output (SCSI IO) referral - The present invention is a method for communication between an initiator system and a block storage cluster. The method includes receiving a command at a first storage system of a block storage cluster. The command is transmitted by the initiator system to the first storage system via a network and includes a request for data. The method further includes transferring the stored data from the first storage system to the initiator system via the network when data requested in the data request is stored by the first storage system. The method further includes transmitting a referral response from the first storage system to the initiator system when a portion of the data requested in the data request is not stored by the first storage system, but is stored by a second storage system of the block storage cluster. The referral response indicates to the initiator system that: a) not all of the requested data was transferred; and b) that the second storage system stores the portion of the requested data. | 06-17-2010 |
20100153768 | METHOD AND SYSTEM FOR PROVIDING IMMUNITY TO COMPUTERS - A method and system for providing immunity to a computer system wherein the system includes an immunity module, a recovery module, a maintenance module, an assessment module, and a decision module, wherein the immunity module, the recovery module, the maintenance module and the assessment module are each linked to the decision module. The maintenance module monitors the system for errors and sends an error alert message to the assessment module, which determines the severity of the error and the type of package required to fix the error. The assessment module sends a request regarding the type of package required to fix the error to the recovery module. The recovery module sends the package required to fix the error to the maintenance module, which fixes the error in the system. | 06-17-2010 |
20100153769 | METHOD AND APPARATUS FOR ENHANCED DESIGN OF MULTI-TIER SYSTEMS - A system and method for performing enhanced modeling of multi-tiered architectures is presented. The system and method enable selection of a preferred design for a multi-tiered architecture of components based on a set of established criteria, and may employ certain vectors and functions in component attributes, and such attributes may include scalability and scope of fault attributes. | 06-17-2010 |
20100162028 | VIRTUAL PROCESSOR METHODS AND APPARATUS WITH UNIFIED EVENT NOTIFICATION AND CONSUMER-PRODUCED MEMORY OPERATIONS - The invention provides, in one aspect, a virtual processor that includes one or more virtual processing units. These virtual processing units execute on one or more processors, and each virtual processing unit executes one or more processes or threads (collectively, “threads”). While the threads may be constrained to executing throughout their respective lifetimes on the same virtual processing units, they need not be. An event delivery mechanism associates events with respective threads and notifies those threads when the events occur, regardless of which virtual processing unit and/or processor the threads happen to be executing on at the time. The invention provides, in other aspects, virtual and/or digital data processors with improved dataflow-based synchronization. A process or thread (collectively, again, “thread”) executing within such processor can execute a memory instruction (e.g., an “Empty” or other memory-consumer instruction) that permits the thread to wait on the availability of data generated, e.g., by another thread and to transparently wake up when that other thread makes the data available (e.g., by execution of a “Fill” or other memory-producer instruction). | 06-24-2010 |
20100162029 | SYSTEMS AND METHODS FOR PROCESS IMPROVEMENT IN PRODUCTION ENVIRONMENTS - A computer-implemented method is provided for identifying a root cause associated with a problem in a production environment. The method includes receiving information indicative of a point-of-origin of the problem. The method further includes providing a cause-and-effect chart having a plurality of user-definable general cause categories, each of the general cause categories having at least one user-definable direct cause subcategory. The method further automatically identifies a primary direct cause of the problem based on the user-defined general cause categories and direct cause subcategories. The method further includes automatically generating a root cause analysis report associated with the primary direct cause, where generating the report includes displaying the cause-and-effect chart and prompting the user for a response to a first of a plurality of questions, wherein the response to the first of the plurality of questions automatically prompts the user for a response to a second of a plurality of questions. | 06-24-2010 |
20100169704 | Ethernet System and Related Clock Synchronization Method - A master device for an Ethernet system is disclosed. The master device includes a receiver, a buffer, a phase lock loop unit, and a transmitter. The receiver is used for generating phase adjustment data according to transmission data sent by a slave device when the master device operates during a switch mode. The buffer is coupled to the receiver for accumulating the phase adjustment data and outputting a phase adjustment value. The phase lock loop unit is coupled to the buffer for adjusting the phase of an output clock according to the phase adjustment value to maintain a fixed phase difference between the recovery clock and the output clock. The transmitter is used for transmitting initialization data to the slave device according to the output clock. | 07-01-2010 |
20100180144 | POWER SYSTEM COMMUNICATION MANAGEMENT AND RECOVERY - A method, system, and computer program product for determining severity of communication deficiencies and isolating faults in a power network of a data processing environment is provided. Pursuant to a continuous graph theory analysis, each of a plurality of nodes of the power network is initialized with the same arbitrary value. Each of a plurality of network paths connecting each of the plurality of nodes is analyzed. Upon a successful communication or communication deficiency over one of the plurality of network paths, one arbitrary value of one of the plurality of nodes connected to the one of the plurality of network paths is incremented or decremented, respectively by a weighting value. The incrementing and decrementing is repeated until a threshold is reached and a recovery is performed on which of the plurality of nodes has a lower adjusted value. | 07-15-2010 |
20100180145 | DATA ACCESSING METHOD FOR FLASH MEMORY, AND STORAGE SYSTEM AND CONTROLLER SYSTEM THEREOF - A data accessing method for accessing data in a plurality of physical page addresses of a plurality of physical blocks in a flash memory chip is provided. The data accessing method includes proving a plurality of logical page addresses for a host system, creating a logical page to physical page mapping table and a physical page to logical page mapping table to record the mapping between the logical page addresses and the physical page addresses. The data accessing method also includes writing data into the physical page addresses, and updating the logical page to physical page mapping table and the physical page to logical page mapping table. The data accessing method further includes determining whether the physical page addresses are valid or invalid based on the logical page to physical page mapping table and the physical page to logical page mapping table. | 07-15-2010 |
20100180146 | Dynamic Membership Management in a Distributed System - Transactional database replication techniques are disclosed that do not require altering of the database management system implementation. A replicator module includes a dynamic membership manager, which is configured to manage the persistent membership of a coordination group. The persistent membership can be, for example, a durable set of sites that can replicate changes amongst themselves. Changes from sites not in the persistent membership are discarded. The set of recovered members is a subset of the persistent membership. The persistent membership changes incrementally by either removing or adding members. Failed members may still be part of the persistent membership. The dynamic membership manager module manages the modification of the persistent membership, initialization of replicas, and propagation of membership information. | 07-15-2010 |
20100185891 | Environment Delivery Network - A method for environmental delivery network prioritizes groups of data for transmission based on a various factors such as synchronization requirements, endpoint configuration, and the fidelity of sensory stimuli reproduction. A device detects data missing from a group of data received from a server and replaces the missing data with replacement data based on a predetermined value. The predetermined value may be based on a default value specific to the sensory stimulus missing data, data received prior to the missing data, or data received prior to and after the missing data. | 07-22-2010 |
20100185892 | TIME-GAP DEFECT DETECTION APPARATUS AND METHOD - A programmatic time-gap defect correction apparatus and method corrects errors which may go undetected by a computer system. Buffer underruns or overruns, which may incur errors in data transfers, yet remain undetected and uncorrected in a computer system, are corrected by an error avoidance module in accordance with the invention. Bytes transferred to and from buffers, used by an I/O controllers to temporarily store data while being transferred between synchronous and asynchronous devices, are counted and an error condition is forced based on the count. If the count exceeds the capacity of the buffer, an error condition is forced, thereby reducing chances that errors are incurred into the data transfer. | 07-22-2010 |
20100192004 | DATA INTEGRATION IN SERVICE ORIENTED ARCHITECTURES - A system, method and program product for transferring structured and unstructured data in a service oriented architecture (SOA) infrastructure. A method is disclosed that includes: receiving a request for a synchronization at a service orchestration engine (SOE), wherein the synchronization includes a transfer of structured meta-data from a first node to a second node and a transfer of unstructured file data from a first file node to a second file node; creating an entry in a routing table to track the synchronization; receiving the structured meta-data at the SOE from the first node and transferring the structured meta-data to the second node; and orchestrating a peer-to-peer data transfer from the first file node to the second file node, including communicating with file handling agents at the first and second file handling nodes. | 07-29-2010 |
20100192005 | Method and system for managing computer systems - A management system for a computer system is disclosed. The computer system operates or includes various products (e.g., software products) that can be managed in a management system or collectively by a group of management systems. Typically, the management system operates on a computer separate from the computer system being managed. The management system can make use of a knowledge base of causing symptoms for previously observed problems at other sites or computer systems. In other words, the knowledge base can built from and shared by different users across different products to leverage knowledge that is otherwise disparate. The knowledge base typically grows over time. The management system can use its ability to request information from the computer system being managed together with the knowledge base to infer a problem root cause in the computer system being managed. The computer system being managed can also request the management system to process its knowledge base for possible problem cause analysis. The management system can also continually identify persisting problem causing symptoms. | 07-29-2010 |
20100199121 | ERROR MANAGEMENT WATCHDOG TIMERS IN A MULTIPROCESSOR COMPUTER - A multiprocessor computer system comprises one or more watchdog timers operable to detect failure of a memory operation based on passage of a certain timing period from a memory operation being issued without a valid response. An error handler is operable to take corrective action regarding the failed memory operation, such as to provide at least one of hardware state management and application state management. | 08-05-2010 |
20100205475 | META-DATA DRIVEN, SERVICE-ORIENTED ARCHITECTURE (SOA)-ENABLED, APPLICATION INDEPENDENT INTERFACE GATEWAY - An interface gateway may receive a request including a first interface identifier. The interface gateway is associated with a group of interfaces, where each interface is associated with metadata that defines the interface. The metadata may include an interface identifier, information identifying services to be executed for the interface and an order in which the identified services are to be executed, and information identifying servers on which the identified services are implemented. The interface gateway may also identify, for the received request, one interface, of the group of interfaces, for processing the request based on the first interface identifier. The interface gateway may further process the received request using the one interface. When processing the received request, the interface gateway may execute the identified services on the identified servers according to the order, where the executing causes data, associated with the received request, to be converted from a source format to a target format. | 08-12-2010 |
20100205476 | System and Method of Efficiently Generating and Sending Bulk Emails - A method of efficiently generating and sending emails including creating an email template, setting up a campaign query, and distributing the email template and a set of information associated with the campaign query to a plurality of server groups. The method also includes running the campaign query on each of the plurality of server groups and obtaining a plurality of matching users; dividing the plurality of matching users into one or more batches; merging the email template with a set of information corresponding to each of the plurality of matching users from a first batch for each of the plurality of server groups, and sending the first batch of the merged emails directly from each of the plurality of server groups without saving copies of the merged emails. | 08-12-2010 |
20100205477 | Memory Handling Techniques To Facilitate Debugging - A method for debugging includes interacting with a memory management component to force an interrupt upon access to one or more memory locations during software execution, and in response to the forced interrupt, saving information regarding the execution of the software, and interacting with the memory management component to disable the interrupt upon access to the one or more memory locations during software execution. | 08-12-2010 |
20100211814 | MONITORING APPARATUS, INFORMATION PROCESSING SYSTEM, MONITORING METHOD AND COMPUTER READABLE MEDIUM - A monitoring apparatus includes: a reception section that receives information including first use mode information from an first information processing apparatus; a storage section that stores the first use mode information received by the reception section; and a transmission section, when the reception section receives fault information together with the first use mode information from the first information processing apparatus, that transmits information concerning countermeasures against a fault to the first information processing apparatus based on the first use mode information and pieces of second use mode information, stored in the storage section, of second information processing apparatuses which normally operate. | 08-19-2010 |
20100211815 | SYSTEM AND METHOD FOR MODIFYING EXECUTION OF SCRIPTS FOR A JOB SCHEDULER USING DEONTIC LOGIC - A system and method for modifying execution scripts associated with a job scheduler may include monitoring for the execution of a task to determine when the task has failed. Details of the failed task may be identified and used to attempt recovery from the task failure. After initiating any recovery tasks, execution of the recovery tasks may be monitored, and one or more supplementary recovery tasks may be identified and executed, or the original task may be rerun at an appropriate execution point based on the initial point of failure. Thus, when a task has failed, an iterative process may begin where various effects of the failed task are attempted to be rolled back, and depending on the success of the rollback, the initial task can be rerun at the point of failure, or further recovery tasks may be executed. | 08-19-2010 |
20100211816 | COMPUTER SERVER SYSTEM INCLUDING A DATABASE POOL MODULE PROVIDING DATABASE CONNECTIVITY RECOVERY FEATURES AND RELATED METHODS - A computer server system may include a plurality of database modules for storing user data for a plurality of users, and at least one processing module comprising a plurality of processing threads for processing jobs for users based upon respective user data. The computer server system may further include a database pool module connected between the plurality of database modules and the at least one processing module. The database pool module may be for selectively connecting the processing threads to corresponding database modules including respective user data for jobs to be processed, and determining when a database module becomes unresponsive and terminating processing thread connections to the unresponsive database module based thereon. The database pool module may also be for determining when the unresponsive database module becomes responsive and restoring processing thread connectivity thereto based thereon. | 08-19-2010 |
20100218030 | INTERACTIVE PROBLEM RESOLUTION PRESENTED WITHIN THE CONTEXT OF MAJOR OBSERVABLE APPLICATION BEHAVIORS - A system, method, and article of manufacture are disclosed for monitoring and resolving problems detected in the application stack. The application stack may include multiple, interpedently application components which collectively provide a unified service. An interactive problem resolution program may monitor and assist users in troubleshooting an application stack installed on a separate computer system. Generally, when a problem in the application stack is detected, the IPR Program may alert users to the problem and provide information about the problem to guide users in taking steps to correct the problem. | 08-26-2010 |
20100218031 | ROOT CAUSE ANALYSIS BY CORRELATING SYMPTOMS WITH ASYNCHRONOUS CHANGES - An indication of a problem in at least one component of a computing system is obtained. A relevant change set associated with a directed dependency graph is analyzed. The computing system is configured to proactively overcome a root cause of the problem. The relevant change set includes a list of past changes to the computing system which are potentially relevant to the problem. The directed dependency graph includes dependency information regarding given components of the computing system invoked by transactions in the computing system. The analyzing includes identifying at least one of the past changes to the computing system that is the root cause of the problem. | 08-26-2010 |
20100223490 | ASSESSING INTELLECTUAL PROPERTY INCORPORATED IN SOFTWARE PRODUCTS - A method, system, and computer usable program product for assessing third-party IP that may be incorporated in a software product are provided in the illustrative embodiments. An instance of the third-party's intellectual property is identified in a component of the product. The instance is classified as actionable, or not actionable. A remediation action is identified for an actionable instance. An entry is created in a remediation report, the entry including information identifying the actionable instance, the remediation action, or a combination thereof. The remediation report is published. A context of the actionable instance may be determined. Based on the context and the actionable instance, a remediation rule may be selected and executed from a set of remediation rules. The output of the remediation rule may be reported as the remediation action in the remediation report. Performing the remediation action may cause manipulation or initiation of a workflow. | 09-02-2010 |
20100223491 | METHODS AND APPARATUS FOR EVENT LOGGING IN AN INFORMATION NETWORK - Methods and apparatus for logging, analysis, and reporting of events such as reboots in a client device (e.g., consumer premises equipment in a cable network) using applications. In one aspect, an improved event logging and monitoring system is provided within the device with which the application(s) can interface to record event or error data. In one exemplary embodiment, the client device comprises a digital set-top box having Java-enabled middleware adapted to implement the various functional aspects of the event logging system, which registers to receive event notifications (including resource exhaustion data) from other applications running on the device. The network operator can also optionally control the operation of the logging system remotely via a network agent. Improved client device and network configurations, as well as methods of operating these systems, are also disclosed. | 09-02-2010 |
20100229022 | COMMON TROUBLESHOOTING FRAMEWORK - Techniques for improving a troubleshooting experience by providing a common troubleshooting framework. Such a framework may enable use of common elements between troubleshooters and lead to similarities between troubleshooting packages, which may improve the user experience. Further, a framework may reduce the amount of knowledge and time necessary to create troubleshooting packages, and thus encourage increased development of these troubleshooting packages. In some implementations of the framework, a troubleshooting package may be implemented in a declarative manner that outlines/describes the problems it solves and the potential solutions to those problems. The declarative troubleshooting packages may then be provided to the troubleshooting framework and may provide direction to the framework, in that the framework may execute functions as directed by the troubleshooter. | 09-09-2010 |
20100229023 | TELEMETRY DATA FILTERING THROUGH SEQUENTIAL ANALYSIS - One embodiment provides a system that analyzes telemetry data from a computer system. During operation, the system periodically obtains the telemetry data from the computer system. Next, the system preprocesses the telemetry data using a sequential-analysis technique. If a statistical deviation is found in the telemetry data using the sequential-analysis technique, the system identifies a subset of the telemetry data associated with the statistical deviation and applies a root-cause-analysis technique to the subset of the telemetry data to determine a source of the statistical deviation. Finally, the system uses the source of the statistical deviation to perform a remedial action for the computer system, which involves correcting a fault in the computer system corresponding to the source of the statistical deviation. | 09-09-2010 |
20100229024 | MESSAGE PRODUCER WITH MESSAGE TYPE VALIDATION - Message type validation occurs at a message producer before a message is sent to a message destination. A message producer system includes an administrator component, which stores message type parameters associated with a message destination. A message is created for the message destination and a validation component at the message producer system checks the created message for conformity with the stored message type parameters for the message destination. An error is reported if the message type does not conform to the stored message type parameters associated with the message destination. The validation component checks the created message for conformity after a publish call by the message producer system and before a send call and, therefore, prevents an invalid or non-conforming message from being sent. | 09-09-2010 |
20100229025 | Fault Recovery in Concurrent Queue Management Systems - A method for fault tolerance and fault recovery in multiprocessor systems that concurrently manage queues is disclosed. The illustrative embodiment comprises a plurality of servers, a queue of jobs to be assigned to the servers, and two queue managers—a primary unit and a secondary unit—such that the secondary fills in for the primary unit while the primary unit is down. The illustrative embodiment provides for smooth transitions from the normal state into the failure state and back into the normal state without losing jobs or violating the queue discipline of the system. | 09-09-2010 |
20100241892 | Energy Optimization Through Intentional Errors - Technologies are described herein for intentionally allowing errors in a computational system to optimize energy consumption of the computational system. A cost-benefit analysis is performed to identify one or more allowable errors and one or more non-allowable errors in the computational system. The allowable errors may be identified by the cost-benefit analysis as being acceptable errors for optimizing energy consumption with respect to accuracy of the computational system. The non-allowable errors may be identified by the cost-benefit analysis as being unacceptable errors for optimizing energy consumption with respect to accuracy of the computational system. The computational system is transformed from a first state in which the computational system corrects or prevents the allowable errors and the non-allowable errors into a second state in which the computational system allows the allowable errors and corrects or prevents the non-allowable errors. | 09-23-2010 |
20100241893 | INTERPRETATION AND EXECUTION OF A CUSTOMIZABLE DATABASE REQUEST USING AN EXTENSIBLE COMPUTER PROCESS AND AN AVAILABLE COMPUTING ENVIRONMENT - Interpretation and execution of a customizable database request using an extensible computer process and an available computing environment is disclosed. In an embodiment, a method includes generating an interpretation of a customizable database request which includes an extensible computer process and providing an input guidance to available processors of an available computing environment. The method further includes automatically distributing an execution of the interpretation across the available computing environment operating concurrently and in parallel, wherein a component of the execution is limited to at least a part of an input data. The method also includes automatically assembling a response using a distributed output of the execution. | 09-23-2010 |
20100251000 | RUN-TIME ADDITIVE DISINFECTION - In embodiments of the present invention improved capabilities are described for runtime additive disinfection of malware. Runtime additive disinfection of malware may include performing the steps of identifying, based at least in part on its type, an executable software application that is suspected of being infected with malware, wherein the malware is adapted to perform a function during the execution of the executable software application, predicting the malware function based on known patterns of malware infection relating to the type executable software application, and in response to the prediction, adding a remediation software component to the executable software application that disables the executable software component from executing code that performs the predicted malware function. | 09-30-2010 |
20100251001 | Enabling Resynchronization Of A Logic Analyzer - In one embodiment, a state machine may enable retraining of a link, where the state machine is to be initiated responsive to an external input received from a logic analyzer coupled to the link or a periodic timer. Such external input may indicate that the logic analyzer has lost synchronization with respect to link communications, and the retraining thus enables the logic analyzer to regain resynchronization. Other embodiments are described and claimed. | 09-30-2010 |
20100251002 | Monitoring and Automated Recovery of Data Instances - The monitoring and recovery of data instances, data stores, and other such components in a data environment can be performed automatically using a separate control environment. A monitoring component of the control plane can include a set of event processors for monitoring a workload of the data environment, where an event processor detecting a problem in the data plane can cause a recovery workflow to generated in order to recover from the detected problem. The event processors can communicate with each other such that if one of the event processors becomes unavailable, the other event processors in a set are able to automatically redistribute responsibility for the workload. | 09-30-2010 |
20100251003 | Recovery from the Loss of Synchronization with Finite State Machines - The invention is a method of operating a system having multiple finite state machines and a controller. Each finite state machine enters an offline state upon detection of anomalous operation. The controller detects whether all finite state machines are offline. The controller transmits an online activation event signal to each finite state machine when all are offline. Each finite state machine evaluates entering the online state if current conditions permit. Reentering the online state includes loading a predetermined set of operating parameters. The finite state machines are responsive only to a reset event and an online activation event when in the offline state. | 09-30-2010 |
20100262857 | DATA STORAGE DEVICE INCLUDING A FAILURE DIAGNOSTIC LOG - In a particular embodiment, a data storage device is disclosed that can include a data storage medium having a device failure partition including a device failure log to store operational state information. The operational state information can include commands, data, performance data, and environmental data associated with the data storage device. The data storage device can further include a controller adapted to selectively store the operational state information to the device failure log in a first-in first-out (FIFO) order representing recent states of the data storage device. | 10-14-2010 |
20100262858 | Invariants-Based Learning Method and System for Failure Diagnosis in Large Scale Computing Systems - A method system for diagnosing a detected failure in a computer system, compares a failure signature of the detected failure to an archived failure signature contained in a database to determine if the archived failure signature matches the failure signature of the detected failure. If the archived failure signature matches the failure signature of the detected failure, an archived solution is applied to the computer system that resolves the detected failure, the archived solution corresponding to a solution used to resolve a previously detected computer system failure corresponding to the archived failure signature in the database that matches the detected failure. | 10-14-2010 |
20100262859 | SYSTEM AND METHOD FOR FAULT TOLERANT TCP OFFLOAD - Systems and methods that provide fault tolerant transmission control protocol (TCP) offloading are provided. In one example, a method that provides fault tolerant TCP offloading is provided. The method may include one or more of the following steps: receiving TCP segment via a TCP offload engine (TOE); calculating a TCP sequence number; writing a receive sequence record based upon at least the calculated TCP sequence number to a TCP sequence update queue in a host; and updating a first host variable with a value from the written receive sequence record. | 10-14-2010 |
20100268979 | METHOD AND SYSTEM FOR SYNTAX ERROR REPAIR IN PROGRAMMING LANGUAGES - The described embodiments present techniques for recovering from syntax errors. These techniques correct potential errors while preserving the shape of the parse tree, and the specific implementation of the techniques can be automatically generated from the grammar. These techniques may operate by looking back at states associated with previously-received tokens to determine pair matching status, when a synchronizing symbol is received. The techniques can respond to the pair matching status determination by potentially adding a synthesized token or by deleting a token that has already been received. The techniques may use a structure referred to herein as a tuple to assist with the evaluation of the pair matching status. Some of the techniques utilize indentation information to evaluate the pair matching status, while other techniques ignore such information. The described embodiments also include a technique for automatically generating the tuples from a set of grammar rules associated with the parser. | 10-21-2010 |
20100268980 | NODE APPARATUS MOUNTED IN VEHICLE AND IN-VEHICLE NETWORK SYSTEM - An in-vehicle network system includes plural electronic control units data-communicably connected via a network. The electronic control units include a master unit and a node apparatus composed of electronic control units other than master unit. In the node apparatus, a node time locally used as a reference time by the node apparatus is produced, and a system reference time is received from the master unit via the network. A node time rate, which is a rate of change of the node time per predetermined time period, is calculated based on changes in the node time. A reference time rate, which is a rate of change of the system reference time per the predetermined time period, is also calculated based on changes in the received system reference. The node time production is controlled such that the node time reduces a difference between the node and reference time rates. | 10-21-2010 |
20100268981 | System and Method for Tunneling System Error Handling Between Communications Systems - A system and method for tunneling system error handling between communications systems are provided. A method for error handling by a controller in an interworking system includes receiving a notification of an occurrence of an error in a first communications system, determining if the error is a long-term error, causing a device in a second communications system with a session in the first communications system to halt communications with the first communications system if the error is a long-term error, and not causing the device in the second communications system with the session in the first communications system to halt communications with the first communications system if the error is not a long-term error. | 10-21-2010 |
20100268982 | ENFORCEMENT PROCESS FOR CORRECTION OF HARDWARE AND SOFTWARE DEFECTS - A method and apparatus for improvement of computer-related products to solve problems caused by artificially embedded locks, barriers, defects, and the like, that force a consumer to needlessly upgrade hardware or software on a computer. An independent developer may procure access to a product, develop a testing regimen for functionality of the product, and perform evaluations to identify sources of any operational defects found. Accordingly, the developer may then provide a generalized testing regimen to test instances of product provided by a supplier, identify those containing the flaw, and may optionally provide a solution to the flaw, where practicable. The independent developer may obtain intellectual property rights in the testing, solution or both for the product. Thus, by notifying a supplier, an independent developer may become a supplier of testing or solution systems, motivating a supplier by one of several mechanisms. The developer may obtain a legal status with respect to the supplier by becoming a customer or user, in order to provide motivation to a recalcitrant supplier not designed to take responsibility for defects known and continued in marketed products. | 10-21-2010 |
20100275054 | KNOWLEDGE MANAGEMENT SYSTEM - Embodiments of the present invention address the above needs and/or achieve other advantages by providing a method, system, computer program product, or a combination of the foregoing for creating a knowledge management system for production support that is standardized and centralized across the channels and sub-channels in an organization. The knowledge management system receives information relating to incidents from databases in the organization. The knowledge management system displays via a user interface at least the following information related to at least one incident, the current status of the incident, the recovery guidelines for effecting resolution of the incident, and scoring values associated with the incident. The knowledge management system also stores and displays historical information, contact information, incident reports, and outstanding incident tickets associated with the incident, as well as process maps or flowcharts for systems, applications, and customer views, and an academy for training associates. | 10-28-2010 |
20100281293 | REPLACING RESET PIN IN BUSES WHILE GUARANTEEING SYSTEM RECOVERY - Systems and methods are disclosed that replace a separate reset pin in a bus with a reset command that guarantees a system recovery. The system comprises a host component circuitry residing on a first chip and a client component circuitry residing on a second, different chip. A bus connects the host component circuitry to the client component circuitry. The host component circuitry is configured to transfer an initial client value associated with a client component time period to the client component circuitry over the bus on a periodic time basis. The periodic time basis is dictated by a host component time period and the client component time period is greater than the host component time period. The client component circuitry is configured to initiate a reset procedure if the client component time period expires which indicates that the initial client value was not received at a next time on the periodic time basis dictated by the host component time period. | 11-04-2010 |
20100281294 | METHOD OF MANAGING OPERATIONS FOR ADMINISTRATION, MAINTENANCE AND OPERATIONAL UPKEEP, MANAGEMENT ENTITY AND CORRESPONDING COMPUTER PROGRAM PRODUCT - A method and apparatus are provided for managing administrative and maintenance operations for a computer connected to a communication network. The method includes: a phase of receiving a request in respect of at least one command to be executed, originating from the computer; a phase of programmed sequential distribution of the at least one command previously recorded within an operations database, destined for the computer; a phase of recording, within a database for collecting results associated with the computer, at least one result of implementing the at least one sequentially distributed command. | 11-04-2010 |
20100287403 | Method and Apparatus for Determining Availability in a Network - Fault management and providing resilience against failures is an useful for many networks. Protection techniques are used to ensure that networks can continue to provide reliable service and to provide redundant capacity within a network to reroute traffic in presence of a failure. A method or corresponding apparatus according to an example embodiment of the present invention relates to determining availability in a network. The example embodiment calculates availability on a per demand basis for working, protection, and restoration paths among all demands in the network and reports the availability. The reported availability may be used to plan and suggest changes to the network or to recommend addition of equipment to improve the availability of the network while ensuring that service level agreements are satisfied. | 11-11-2010 |
20100287404 | INPUT COMPENSATED AND/OR OVERCOMPENSATED COMPUTING - Techniques are generally described for correcting computation errors via input compensation and/or input overcompensation. In various examples, errors of a computation may be detected, and input compensation and/or overcompensation to correct the errors may be created. The disclosed techniques may be used for power and/or energy minimization/reduction, and debugging, among other applications. Other embodiments and/or applications may be disclosed and/or claimed. | 11-11-2010 |
20100293407 | Systems, Methods, and Media for Recovering an Application from a Fault or Attack - Systems, methods, and media for recovering an application from a fault or an attack are disclosed herein. In some embodiments, a method is provided for enabling a software application to recover from a fault condition. The method includes specifying constrained data items and assigning a set of repair procedures to the constrained data items. The method further includes detecting a fault condition on the constrained data items during execution of the software application, which triggers at least one repair procedure. The triggered repair procedures are executed and the execution of the software application is restored. In some embodiments, the restoring comprises providing memory rollback to a point of execution of the software application before the fault condition was detected. | 11-18-2010 |
20100318832 | HANG RECOVERY IN SOFTWARE APPLICATIONS - Various embodiments provide a guard mechanism that is configured to prevent transmission of synchronous function calls to hung application components. In at least some embodiments, the guard mechanism receives a synchronous function call that is intended for an application component. Before permitting the synchronous function call to be transmitted to the application component, the guard mechanism determines whether the component is hung. Responsive to determining that the component is not hung, the guard mechanism permits the synchronous function call to be transmitted to the component. If, however, the guard mechanism determines that the application component is hung, a hung component recovery process is initiated. | 12-16-2010 |
20100318833 | METHOD FOR THE SECONDARY ERROR CORRECTION OF A MULTI-PORT NETWORK ANALYZER - A method for the error correction of a vectorial network analyzer, where a primary system calibration is initially implemented using a calibration kit. Following this, a first, secondary error correction is implemented on at least two one-port networks of the vectorial network analyzer. After this first, secondary error correction of the one-port networks of the vectorial network analyzer, a second, secondary error correction is implemented, where either two one-port networks are through-connected in an ideal manner or a measurement is implemented on a reciprocal two-port network. The corrected system-error values from the first, secondary error correction are used even in this further measurement, and overall, a high-precision, calibrated multi-port network analyzer is obtained. | 12-16-2010 |
20100325470 | Extended Messaging Platform - A message system, including at least one server configured to receive a message from a originating device for delivery to at least one recipient device via a first delivery channel; and wherein the at least one server is further configured to select an alternate delivery channel in the event that delivery of the message via the first delivery channel cannot be effected, is disclosed. The invention further discloses a method for routing messages including the steps of receiving at a server a message from an originating device for delivery to a to at least one recipient; forwarding the message to the at least one recipient device via a first delivery channel; awaiting receipt of acknowledgement message from said least one recipient device, and in the event that no acknowledgment message is received, the at least one server resends the message to said at least one recipient device via an alternate delivery channel. | 12-23-2010 |
20100332889 | MANAGEMENT OF INFORMATION TECHNOLOGY RISK USING VIRTUAL INFRASTRUCTURES - Information Technology Risk to an organization is associated with a plurality of virtual machines (VMs) each running on a plurality of hosts, each host being a computer system connected to a network and in communication with a risk orchestrator, which receives threat indication messages (TIMs) from threat indicators. Each TIM indicates a status of a threat to which a hosts is vulnerable. Downtime probability (DTP) resulting from the threat and an overall host DTP for each host are calculated. For each VM, a risk value associated with the VM is calculated as a function of the host DTP for and an impact for the VM, the impact being a value reflecting a relative importance of the VM to the organization. Each VM requiring risk mitigation is identified and prioritized in accordance with a policy, and a configured mitigation control action may be carried out for each VM requiring risk mitigation. | 12-30-2010 |
20100332890 | SYSTEM AND METHOD FOR VIRTUAL MACHINE MANAGEMENT - A system and method are provided for virtual machine management. The system comprises a virtual machine manager, a blade server management module, at least one blade server, and a virtual machine manager. The virtual machine manager comprises an abnormal event receiving module for receiving information about a blade server having a hardware problem directly from the blade server management module and additionally a virtual machine management module for sending a processing command to a virtual machine hypervisor on the blade server having the hardware problem. The virtual machine management module receives the information about the hardware problem from the abnormal event receiving module. The processing command is determined in accordance with the information about the hardware problem and strategies for handling predefined hardware problems. | 12-30-2010 |
20110004780 | SERVER SYSTEM AND CRASH DUMP COLLECTION METHOD - There is provided a server system that collects memory information at the time of occurrence of a failure if a failure occurs in the operating system so as to enable failure analysis. Stall monitoring of a firmware is performed by hardware and, if a stall is detected, a reset is performed. A memory has a memory area used by a boot loader of the firmware and a memory area used by another part of the firmware. It is determined based on a reset factor retained in a device whether the reset is a normal reset or a reset associated with the stall detection. In the case where the reset is a reset associated with the stall detection, information of the memory area of the memory used by the another part of the firmware at the time of occurrence of the stall is collected. | 01-06-2011 |
20110010577 | METHOD FOR REACTIVATING AT LEAST ONE MEDIA TRANSFER PROTOCOL-COMPATIBLE DEVICE WHEN AN UNRECOVERABLE ERROR OCCURS, AND ASSOCIATED HOST - A method for reactivating at least one media transfer protocol-compatible (MTP-compatible) device when an unrecoverable error occurs includes: temporarily storing a transaction ID of a latest operation performed on the MTP-compatible device; and selectively communicating with the MTP-compatible device by utilizing the transaction ID when an unrecoverable error of the MTP-compatible device occurs. An associated host for reactivating at least one MTP-compatible device when an unrecoverable error occurs includes a storage unit and a processing circuit. The storage unit is arranged to temporarily store a transaction ID of a latest operation performed on the MTP-compatible device. In addition, the processing circuit is arranged to selectively communicate with the MTP-compatible device by utilizing the transaction ID when an unrecoverable error of the MTP-compatible device occurs. | 01-13-2011 |
20110016347 | Tool for Analyzing and Resolving Errors in a Process Server - A method for analyzing and resolving problems in a process server is disclosed herein. In one embodiment, such a method may include receiving a log file associated with an application running on the process server. The application may be made up of higher-level service component artifacts, and lower-level implementation artifacts used to implement the higher-level service component artifacts. The method may further include identifying error messages in the log file and determining which implementation artifacts are associated with the error messages. The method may further include mapping the implementation artifacts to service component artifacts associated with the implementation artifacts. The error messages may then be displayed along with their relationship to the service component artifacts. A corresponding apparatus and computer program product are also disclosed and claimed herein. | 01-20-2011 |
20110016348 | System and method for bridging assets to network nodes on multi-tiered networks - An exemplary method and/or exemplary embodiment of the present invention provides a system and method for bridging an asset over a multi-tiered network. Generally, communications can be maintained between executable assets residing on different network nodes by bridging the execution context of the two nodes. In an embodiment, a mapping layer can be generated for assets that have run-time dependencies; the mapping layer uses a distribution system to bridge the execution context of a first environment with that of a second environment. The asset executing in the first environment can access another resource located in the second environment, even though the asset does not have local access to the resource in the second environment. A fault is detected when at least one asset deployed on a local node attempts to access at least one resource on a remote node through an application programming interface. The fault is then handled appropriately. | 01-20-2011 |
20110022880 | Enabling Existing Desktop Applications To Access Web Services Through The Use of a Web Service Proxy - The present invention enables desktop applications to access web services through Plug-ins and a Web Service Proxy Server. An administrator registers a web service by providing the URL of the WSDL file of the web service. The target desktop applications and the operations are identified using the WSDL file. Operations that are not compatible with the desktop applications are removed from a published list of operations. The administrator appends additional formatting information, communication standards and security policies to the WSDL file. A user accessing the web services is first authenticated and authorized. Thereafter, the user accesses the web services through Web Service Proxy Server. The communication with the web services complies with the standards and security policies specified in the WSDL files. The output data obtained from the web services are presented using template documents. These template documents are generated based on the formatting information provided in the WSDL files. | 01-27-2011 |
20110022881 | DISTRIBUTED RESOURCE MANAGING SYSTEM, DISTRIBUTED RESOURCE MANAGING METHOD, AND DISTRIBUTED RESOURCE MANAGING PROGRAM - A distributed resource managing system has one or more resource managing processes corresponding to each of predefined events that change the states of resources, on a communication network where each of a plurality of tasks can use a plurality of resources. Each of the one or more resource managing processes includes an assignor which, when it receives a request to protect any specific task against the event that changes states of resources to which its own process corresponds, assigns backup resources including a resource already selected by another resource managing process to the task in such a way that all tasks requested to be protected which use the resource can be protected from the event that changes the states of the resources, and an indicator which indicates information of the assigned backup resources to one or more recovery execution processes. | 01-27-2011 |
20110029805 | REPAIRING PORTABLE EXECUTABLE FILES - A portable executable file can be repaired by identifying an invalid field of a portable executable file. A likelihood of repairing the invalid field of the portable executable file is determined. A repair model for repairing the invalid field of the portable executable file is generated, and the invalid field of the portable executable file is repaired based upon, at least in part, the repair model. | 02-03-2011 |
20110035616 | DETECTION OF UNCORRECTABLE RE-GROWN FUSES IN A MICROPROCESSOR - A microprocessor includes a first plurality of fuses, a predetermined number of which are selectively blown. Control values are provided from the first plurality of fuses to circuits of the microprocessor to control operation of the microprocessor. The microprocessor also includes a second plurality of fuses, blown with the predetermined number of the first plurality of fuses that are blown. In response to being reset, the microprocessor is configured to: read the first plurality of fuses and count a number of them that are blown; read the predetermined number from the second plurality of fuses; compare the counted number with the predetermined number read from the second plurality of fuses; and prevent itself from fetching and executing user program instructions if the number counted from reading the first plurality of fuses does not equal the predetermined number read from the second plurality of fuses. | 02-10-2011 |
20110035617 | USER-INITIATABLE METHOD FOR DETECTING RE-GROWN FUSES WITHIN A MICROPROCESSOR - A microprocessor includes a first plurality of fuses, selectively blown with a predetermined value for provision to circuits of the microprocessor to control operation of the microprocessor. The microprocessor also includes a second plurality of fuses, selectively blown with error detection information used to detect an error in the first plurality of fuses such that a blown fuse of the microprocessor returned a non-blown binary value. In response to a user program instruction, the microprocessor is configured to determine whether there is an error in the first plurality of fuses such that a blown fuse returned a non-blown binary value using the error detection information from the second plurality of fuses. | 02-10-2011 |
20110041000 | METHOD AND A DEVICE FOR ADJUSTING A TRUNK STATE - The present invention provides a method and a device for adjusting a trunk state, wherein the method includes: if a media gateway controller receives a request of a new service and thus uses the trunk circuit, sending an adding command message to a media gateway; after receiving the adding command message, the media gateway checking that the trunk circuit is unusable and can not execute the process for the adding command message successfully; the media gateway sending an error response message and a state report message indicating that the trunk circuit is unusable to the media gateway controller; and after receiving the state report message, the media gateway controller modifying the state information of the trunk circuit in the media gateway controller according to the state report message and returning a response message to the media gateway. Therefore, based on the H.248/Megaco protocol, the present invention solves the phenomenon of the trunk circuit state inconsistency wherein the MGC considers a trunk circuit state as normal while the MGW considers the trunk circuit state as abnormal. The present invention can more rapidly and simply process the phenomenon so as to reduce call loss. | 02-17-2011 |
20110047404 | ANALYSIS AND PREDICTION SYSTEMS AND METHODS FOR RECOVERY BASED SOCIAL NETWORKING - Systems and methods for recovery based social networking are presented. An analysis module analyzes past and current activity of users on the social networking platform. The analysis module predicts, based on user activity, when particular users will need support from user identified supporters and healthcare professionals. The analysis module sends alert messages to the pre-determined supporters and healthcare professionals soliciting support responsive to the predictions. | 02-24-2011 |
20110055619 | SYNCHRONIZING PROBLEM RESOLUTION TASK STATUS USING AWARENESS OF CURRENT STATE AND TRANSACTION HISTORY - Systems, methods and articles of manufacture are disclosed for synchronizing a transaction profile with a resolution status of a problem experienced by an application. The problem may be detected for the application. A transaction profile may be retrieved for the detected problem. The transaction profile may include a sequence of transactions to be performed on the system to remedy the open problem. Transactions occurring on the system may be monitored, and an instance of the transaction profile may be updated accordingly to create a synchronized transaction profile. | 03-03-2011 |
20110055620 | Identifying and Predicting Errors and Root Causes in a Data Processing Operation - Methods and systems for automated quality management and/or monitoring of a data processing operation, including identifying root causes for errors, identifying which errors are likely to have root causes, predicting errors, and predicting increases in errors. | 03-03-2011 |
20110060937 | METHODS AND SYSTEMS FOR FAILURE ISOLATION AND DATA RECOVERY IN A CONFIGURATION OF SERIES-CONNECTED SEMICONDUCTOR DEVICES - A method of identifying at least one anomalous device in a configuration of series-connected semiconductor devices, comprising: selecting a device in the configuration; sending a command to the selected device, the command for placing the selected device into a recovery mode of operation; attempting to elicit identification data from the selected device while in the recovery mode of operation; if the attempt is successful, selecting a next device in the configuration of series-connected semiconductor devices and repeating the sending and the attempting to elicit; and if the attempt is unsuccessful, concluding that the selected device is an anomalous device. Also, a method of recovering data from a configuration of series-connected semiconductor memory devices having undergone a failure, comprising: placing an operable device of the configuration into a recovery mode of operation; while the operable device is in the recovery mode of operation, retrieving data currently stored by the operable device; and storing the retrieved data in an alternate memory facility. | 03-10-2011 |
20110072298 | Information processing device, transfer circuit and error controlling method for information processing device - An information processing device includes SBs; an XBB for executing data transfer between the SBs; and an SCF for managing and controlling the SBs and the XBB. The SB includes a transmitting/receiving unit for transmitting a notification packet indicating occurrence of an error via the XBB when detecting the occurrence of the error. The SCF includes an executing unit for executing a configuration change process corresponding to an instruction when detecting the instruction related to the SB, a suspending unit for suspending acceptance of an error report from the SB in which the error occurs during execution of the configuration change process and an XBB controller for controlling the XBB to destroy the notification packet received from the SB of which configuration change process is being executed and controlling the XBB to inhibit transfer of the notification packet to the SB of which configuration change process is being executed. | 03-24-2011 |
20110078487 | SERVICE PLAN WEB CRAWLER - A web crawler for downloading and analyzing the contents of a merchant's website. The web crawler may analyze the products advertised and determine whether a service plan is properly associated. The crawler may also analyze the placement of the service plans on the website, and store the information in a database. A dynamic mapper is also provided which can determine what service plan should be associated with a particular product. The dynamic mapper may also suggest what type of control to use for a particular customer. A webserver containing software for updating a webpage is also disclosed. A process for updating a webpage is also disclosed. | 03-31-2011 |
20110083031 | SYSTEM AND METHOD FOR SLOW AD DETECTION - A system and method for slow ad detection is provided. An ad tool receives information including round trip times to load web pages, in which each web page is loaded with at least one ad. Additionally, the ad tools calculates, for each ad, a mean round trip time to load each web page loaded with the respective ad. The ad tool then determines a predetermined number of the ads with highest mean round trip to load each of the web pages with the ad. Further, the ad tool enables testing of each of the predetermined number of ads to determine the round trip load time of each of the predetermined number of ads. | 04-07-2011 |
20110083032 | RECOVERY OF TRANSMISSION ERRORS - A permeable protocol layer decoder ( | 04-07-2011 |
20110083033 | COMPUTER SYSTEM DUPLICATING WRITES BY SYNCHRONOUS REMOTE COPY WITH MULTIPLE HOST COMPUTERS USING HETEROGENEOUS OPERATING SYSTEMS - A computer system having a plurality of host computers and a storage system is provided which allows any one host computer to perform a global copy operation on any arbitrary or all storage areas in the storage system. To this end, storage areas provided by the disk devices are grouped into groups by allocating group numbers to a plurality of specified storage areas. The copy operation can be performed by specifying desired groups. Each of the groups is made up of sub-groups and the sub-groups are defined for each computer to assure a consistency of copy order of the sub-groups. | 04-07-2011 |
20110087915 | HYBRID RELIABLE STREAMING PROTOCOL FOR PEER-TO-PEER MULTICASTING - Peer-to-peer multicasting of streaming data in a node in a peer-to-peer computer environment. A transmission of packets is received at the node, wherein the packets are data packets pushed from a parent node and comprises data of a sub stream of the streaming data. A buffer map of the node is created at the node, wherein the buffer map lists the packets that have been received and an available bandwidth of the node. The node is connected with at least one neighboring node. The buffer map of the node is exchanged with a buffer map of the at least one neighboring node. Provided a determination is made that at least one packet in the sub stream of the streaming data was not received at the node, the at least one packet is pulled from the at least one neighboring. | 04-14-2011 |
20110087916 | METHOD AND SYSTEM FOR PROVIDING ADVERTISEMENT - An advertisement providing method and system may allow improving the frequency of advertising exposure. In a method for providing an advertisement, a client sends a request message to a server. The request message requests an action of the server. The server determines whether an error occurs when performing the action. If the error occurs, the server extracts advertisement data based on information in the request message. The server creates a response message by combining the extracted advertisement data with error occurrence information, and sends the response message to the client. | 04-14-2011 |
20110087917 | Method and Apparatus for Implementing a Predetermined Operation in Device Management - A method for implementing a predetermined operation in device management, being based on a DM system defined by OMA, includes: sending by the device management system a second predetermined operation based on a trigger condition to a terminal device and storing by the terminal device the received second predetermined operation; and obtaining by the terminal device from itself the second predetermined operation and executing the second predetermined operation when the trigger condition is satisfied. The present invention also discloses an apparatus for implementing a predetermined operation in device management. | 04-14-2011 |
20110093737 | Error recovery within processing stages of an integrated circuit - An integrated circuit includes a plurality of processing stages each including processing logic | 04-21-2011 |
20110099412 | CLUSTER NEIGHBORHOOD EVENT ADVISORY - Database server instances in a database server cluster broadcast, to other instances in the cluster, information concerning certain problem events. Because each server instance is aware of problems that other server instances are experiencing, each server instance is enabled to make more intelligent decisions regarding the actions that it should perform in response to the problems that the server instance is experiencing. Instead of terminating itself, a server instance might opt to wait for a longer amount of time for an operation to complete. The server instance may do so due to the server instance having received information that indicates that other server instances are experiencing similar problems. Whenever the information received from other server instances makes it appear that a problem is unlikely to be solved in the cluster as a whole by terminating a server instance, that server instance may continue to wait instead of terminating itself. | 04-28-2011 |
20110107135 | INTELLIGENT ROLLING UPGRADE FOR DATA STORAGE SYSTEMS - Various method, system, and computer program product embodiments for facilitating upgrades in a computing storage environment are provided. In one such embodiment, one of an available plurality of rolling upgrade policies registering at least one selectable upgrade parameter for an upgrade window is selected. A node down tolerance factor is set for at least one node in the computing storage environment. The node down tolerance factor specifies a percentage of elements of the at least one node taken offline to apply the selected one of the available plurality of rolling upgrade policies during the upgrade window. | 05-05-2011 |
20110119523 | ADAPTIVE REMOTE DECISION MAKING UNDER QUALITY OF INFORMATION REQUIREMENTS - A system and method for adaptive remote decision making includes steps of: receiving from an application layer a target range for a level of reporting quality for processed data; setting data collection parameters to meet the target range; collecting the data from a plurality of remote data collecting devices deployed in the distributed computing system, a portion of said data being compromised during the collecting process; processing the collected data to produce the processed data; evaluating the processed data based on observable metrics of current collected data and reported data losses; forecasting an expected reporting quality while continuing to collect the data; comparing the expected reporting quality with the target range; and reporting the processed data when the expected reporting quality falls within the target range for the level of reporting quality. | 05-19-2011 |
20110119524 | Maintaining Communication Continuity - A computer program product includes a computer usable memory, storage medium or physical medium having computer usable program code embodied therewith, the computer usable program code including: | 05-19-2011 |
20110126040 | IDENTIFYING SYNTAXES OF DISPARATE COMPONENTS OF A COMPUTER-TO-COMPUTER MESSAGE - A computer-implemented method, system and computer program product for identifying syntaxes of disparately syntaxed components of a message file are presented. A computer displays a message file that comprises disparately syntaxed components. A processor detects a selection of a selected component from the disparately syntaxed components, and displays a description of a syntax used by the selected component on a user interface. | 05-26-2011 |
20110154091 | ERROR LOG CONSOLIDATION - A system for error log consolidation is disclosed herein. A server computer includes a plurality of system processors and error log consolidation logic. The system processors are configurable to form isolated execution partitions. The error log consolidation logic is configured to, based on detection of a fault in the server, retrieve error logs from the system processors, and to consolidate the retrieved logs with server computer information not available to the system processors to generate a consolidated error log. The consolidated error log includes a comprehensive set of server information relevant to identifying a cause of the detected fault. | 06-23-2011 |
20110154092 | MULTISTAGE SYSTEM RECOVERY FRAMEWORK - A method and system for multi-staged recovery of a distributed computer system. The method includes receiving a failure event notification from at least one node of the distributed computer system and executing a plurality of recovery stages upon receiving the failure event notification by using a recovery manager, wherein each of the plurality of recovery stages performs a defined recovery task. The progress of recovery is tracked by using at least one state machine executed by the recovery manager, wherein the state machine reflects progress of each of the recovery stages. The progress of recovery is monitored to a completion by using the state machine and the recovery manager. | 06-23-2011 |
20110154093 | METHODS, SYSTEMS, AND COMPUTER PROGRAM PRODUCTS FOR IDENTIFYING CYCLICAL BEHAVIORS - Methods, systems, and computer program products for identifying cyclical behaviors are provided. A method includes defining a time-based set of splines in an equation for a dataset, identifying a periodicity of a cycle derived from implementing the time-based set of splines on the dataset, and taking a responsive action as a result of identifying the periodicity of the cycle. | 06-23-2011 |
20110154094 | Error Handling Structure For Use in a Graphical Program - System and method for error handling in a graphical program. An error handling structure is displayed in a graphical program. The error handling structure includes a first frame configured to contain graphical program code for which error handling is to be provided. At least a portion of the graphical program is included in the first frame in response to user input specifying the at least a portion of the graphical program. During execution of the graphical program, the error handling structure aborts execution of the at least a portion of the graphical program in the first frame in response to detection of an unhandled error in the at least a portion of the graphical program in the first frame and continues execution of the graphical program. | 06-23-2011 |
20110154095 | Management of Space in Shared Libraries - Any computer process has the opportunity of attempting to load a data object into a global shared library area and, in the event that there is insufficient space in this global area resulting in a failure to load, there is then an automatic location or creation of a named shared library area for the data object that is transparent and does not need and user action. | 06-23-2011 |
20110154096 | Business Methods Retry Optimization - The present disclosure involves systems, software, and computer implemented methods for retrying business methods at an application server after thrown exceptions. One process includes operations for invoking a business method of an enterprise bean hosted in an enterprise bean container. The operations further include determining whether retry conditions are satisfied after an exception is thrown during execution of the business method. The business method is invoked again based on a predefined retry policy when the retry conditions are satisfied. | 06-23-2011 |
20110161721 | Method and system for achieving a remote control help session on a computing device - A method and system for achieving a remote control help session on a computing device. The method includes receiving, at an online service datacenter, a request from a remote service provider computer to obtain a pass code for an end user of a malfunctioning computing device. Sending the pass code to the remote service provider computer, wherein a service provider technician provides the pass code to the end user. Securely connecting the malfunctioning computing device to the online service datacenter. Securely connecting the remote service provider computer to the online service datacenter. Linking the remote service provider computer to a PC session indicated by the pass code and enabling the service provider computer to connect through the online service datacenter to the malfunctioning computing device. The remote service provider computer, via firmware residing on the malfunctioning computing device, enables the service provider technician to diagnose, repair, and/or optimize the malfunctioning computing device. | 06-30-2011 |
20110173482 | Data processing apparatus and method for providing fault tolerance when executing a sequence of data processing operations - A data processing apparatus and method provide fault tolerance when executing a sequence of data processing operations. The data processing apparatus has processing circuitry for performing the sequence of data processing operations, and a redundant copy of that processing circuitry for operating in parallel with the processing circuitry, and for performing the same sequence of data processing operations. Error detection circuitry detects an error condition when output data generated by the processing circuitry differs from corresponding output data generated by the redundant copy. Shared prediction circuitry generates predicted data input to both the processing circuitry and the redundant copy, with the processing circuitry and redundant copy then performing speculative processing of one or more data processing operations in dependence on that predicted data. Each of the processing circuitry and the redundant copy include checking circuitry for determining whether the speculative processing was correct, and initiating corrective action if the speculative processing was not correct. By sharing the prediction circuitry rather than replicating it within both the processing circuitry and the redundant copy, significant area and power consumption benefits can be achieved without affecting the ability of the apparatus to detect faults. | 07-14-2011 |
20110173483 | FAST RESOURCE RECOVERY AFTER THREAD CRASH - A resource recovery system may maintain a counter in memory that indicates a number of times one or more threads of execution, which use shared resources, have crashed. The system may associate a first value of the counter with a resource allocated to a thread of the one or more threads, and may set an indicator associated with the thread to indicate whether the thread has crashed. The system may determine whether to re-allocate the resource to the thread based on the first value of the counter associated with the resource and based on the indicator associated with the thread. | 07-14-2011 |
20110173484 | SOLID-STATE MASS STORAGE DEVICE AND METHOD FOR FAILURE ANTICIPATION - A solid-state mass storage device and method of operating the storage device to anticipate the failure of at least one memory device thereof before a write endurance limitation is reached. The method includes assigning at least a first memory block of the memory device as a wear indicator that is excluded from use as data storage, using pages of at least a set of memory blocks of the memory device for data storage, writing data to and erasing data from each memory block of the set in program/erase (P/E) cycles, performing wear leveling on the set of memory blocks, subjecting the wear indicator to more P/E cycles than the set of memory blocks, performing integrity checks of the wear indicator and monitoring its bit error rate, and taking corrective action if the bit error rate increases. | 07-14-2011 |
20110173485 | FEC IN COGNITIVE MULTI-USER OFDMA - A multiuser scheme allowing for a number of users, sets of user, or carriers to share one or more channels is provided. In the invention, the available channel bandwidth is subdivided into a number of equal-bandwidth subchannels according to standard OFDM practice. A transmitter transmits data on a set of OFDM subchannels that need not be contiguous in the spectrum or belong to the same OFDM channel. A receiver receives and decodes the data and detects errors on subchannels. The receiver then broadcasts the identity of those subchannels on which the error rate exceeds a specific threshold, and the transmitter may select different subchannels for transmission based on this information. | 07-14-2011 |
20110173486 | COMMUNICATION APPARATUS, NETWORK, AND ROUTE CONTROL METHOD USED THEREFOR - Provided is a communication apparatus that is capable of enhancing fault resistance of a network. A communication apparatus includes, in a network where messages are exchanged among a plurality of communication apparatuses and route control is performed, a route summary processing means that creates route summary information obtained by summarizing route information of a local apparatus from a route table holding route information used for route control and traffic information between communication apparatuses, the route summary processing means further exchanging the route summary information that is created with other communication apparatuses; and a fault influence degree information processing means that calculates fault influence degree information indicating a degree of influences on traffic and the other communication apparatuses when the local apparatus is in a fault state, the fault influence degree information processing means further exchanging the fault influence degree information that is calculated with the other communication apparatuses; and a route adjustment processing means that adjusts the route table of the local apparatus using the route summary information and the fault influence degree information that are obtained by the local apparatus. | 07-14-2011 |
20110185220 | REMOTE DIAGNOSTIC SYSTEM AND METHOD BASED ON DEVICE DATA CLASSIFICATION - A remote diagnostic system and method based on device data classification. Device diagnostic data with respect to a device can be acquired and a conditional probability look up table can be constructed for each fault code associated with the device diagnostic data by a classification module. A score function can then be created by summing the conditional probabilities and an occurrence of the fault code can be mapped to a service call category with a numerically highest score function. The fault occurrence data in association with a number of time stamps and device identifiers can be stored in a data warehouse. The occurrence of fault code can be matched with respect to a solution set which can be automatically dispatched to a customer via a communications link. | 07-28-2011 |
20110197090 | Error Reporting Through Observation Correlation - A software component is executed to carry out a task, the task including a subtask. An external function is called to perform the subtask, the external function executing in a separate thread or process. The component receives an observation recorded by the external function, the observation including an identifier of a possible error condition and instance data associated with the possible error condition. The possible error condition being a cause of the failure of the external function to carry out the subtask. If the task cannot be completed, then a new observation is recorded along with the received observation, the new observation being related to a possible error condition of the component, which is a cause of the failure of the component to carry out the task. When the task can be completed despite the failure of the external function, the observation recorded by the external function is cleared. | 08-11-2011 |
20110197091 | SWITCH DEVICE, SWITCH CONTROL METHOD AND STORAGE SYSTEM - A switch device includes a memory unit for storing therein an error response for each error event to be sent in response at the time of a failure with respect to a control signal that controls a storage device connected to the switch device, an error response output unit for receiving input of the control signal and sequentially outputting each error response stored in the memory unit, an operation information computing unit for detecting an operation of a calculating device, which is connected to the switch device, corresponding to each error response output by the error response output unit and for obtaining, as operation information, a condition defining the operation of the calculating device upon receiving each error response, and an operation setting unit for setting operation condition at the time of a failure based on the operation information. | 08-11-2011 |
20110208992 | Universal Resource Locator Watchdog - A watchdog system for identifying failures in uniform resource locators (URLs) respective of advertized content. The system comprises a database containing at least campaign information, the at least campaign information containing at least a URL to be monitored by the watchdog system, the URL directs to advertized content; and a server connected to the database and operative to monitor the at least URL for identification of a failure in the URL providing the advertized content, and performing a corrective action for correcting the URL based on definitions in the at least campaign information. | 08-25-2011 |
20110208993 | SYSTEMS AND METHODS FOR DIAGNOSING AND FIXING ELECTRONIC DEVICES - Systems and methods for reducing the cost and time required for diagnosing and fixing electronic devices are provided. A host electronic device may be configured to generate a log of events that it experiences. A help component may access the generated log and analyze the log to detect if the host device has experienced a problem. Data may then be exchanged between the help component and the host device in order to fix the detected problem. | 08-25-2011 |
20110214006 | AUTOMATED LEARNING OF FAILURE RECOVERY POLICIES - Described is automated learning of failure recovery policies based upon existing information regarding previous policies and actions. A learning mechanism automatically constructs a new policy for controlling a recovery process, based upon collected observable interactions of an existing policy with the process. In one aspect, the learning mechanism builds a partially observable Markov decision process (POMDP) model, and computes the new policy base upon the learned model. The new policy may perform automatic fault recovery, e.g., on a machine in a datacenter corresponding to the controlled process. | 09-01-2011 |
20110219258 | Content Interruptions - Techniques that address content interruptions are described. In an implementation, an interruption is detected at the client device in receipt of a stream of content from a distribution system that is to be recorded locally in memory at the client device. A stream of content is generated at the client device and the generated stream of content is recorded to fill the interruption in the stream of content from the distribution system in the memory of the client device. | 09-08-2011 |
20110225446 | IDENTIFYING A DEFECTIVE ADAPTER - A method, system, and computer usable program product for identifying a defective adapter are provided in the illustrative embodiments. A configuration process of the adapter is initiated, the adapter being coupled with a slot in a data processing system. An indication of the configuration process is activated. A determination is made whether the configuration has completed successfully. The indication is allowed to remain activated responsive to the configuration not completing successfully. The activated indication identifies the defective adapter. | 09-15-2011 |
20110246812 | WINDOW SUPPRESSION - A method of suppressing unwanted windows created by an operating system is described. The method comprises: monitoring calls from the operating system relating to creation of a window and ascertaining if a monitored call relates to creation of a window of a type corresponding to a window type to be blocked. If the window is not of a type that is to be blocked, then the method involves displaying the window. If the window is of a type that is to be blocked, then the method involves: registering a new window procedure for that window; and returning an error message to the operating system using the new window procedure to suppress display of the window. | 10-06-2011 |
20110252268 | Hierarchical configurations in error-correcting computer systems - When errors arise in a computing system that has plural modules, this invention corrects those errors. In the first instance, the invention excludes the computing system itself, but receives error messages from the plural modules of that system—along plural receiving connections, respectively. Plural sending connections return corrective responses to plural modules of that system, respectively. In a second instance, the invention further incorporates that system. The invention is hierarchical: plural levels or tiers of apparatus and function are present—a first (typically uppermost) one directly serving that system as described above, and others (lower) that analogously serve the first tier of the invention—and than also the subsequent tiers, in a cascading or nested fashion, down to preferably a bottom-level tier supporting all the upper ones. Each level preferably controls power interruption and restoration to higher levels. Ideally the hierarchy is in the form of a “system on chip”. | 10-13-2011 |
20110252269 | SYSTEM AND METHOD FOR AUTOMATICALLY UPLOADING ANALYSIS DATA FOR CUSTOMER SUPPORT - The invention enhances automatic incident control, problem control, and problem prevention using information provided by the analysis or analysis data. The burden on the part of both users and providers to resolve problems is reduced by using a method of automatic analysis data upload and intelligent problem analysis and resolution. Problems are better identified, investigated, diagnosed, recorded, classified, and tracked until affected services return to normal operation and errors trends are used to proactively prevent future problems. | 10-13-2011 |
20110258480 | METHODS AND APPARATUS FOR MANAGING ASYNCHRONOUS DEPENDENT I/O FOR A VIRTUAL FIBRE CHANNEL TARGET - Methods and apparatus for managing exchange IDs for multiple asynchronous dependent I/O operations generated for virtual Fibre Channel (FC) target volumes. Features and aspects hereof allocate a range of exchange identifier (X_ID) values used in issuing a plurality of physical I/O operations to a plurality of physical FC target devices that comprise the virtual FC target volume. The plurality of physical I/O operations are dependent upon one another for completion of the original request to the virtual FC target volume and allow substantially parallel operation of the plurality of physical FC target devices. A primary X_ID is selected from the range of allocated X_ID values for communications with the attached host system that generated the original request to the virtual FC target volume. | 10-20-2011 |
20110271136 | PREDICTIVELY MANAGING FAILOVER IN HIGH AVAILABILITY SYSTEMS - A method, system, and computer usable program product for predictively managing failover in a high availability system are provided in the illustrative embodiments. A disruptive activity occurring on the HA data processing system is detected. The disruptive activity has a potential to cause an operation of the HA data processing system to perform outside a specified parameter. A determination is made of a desired response in the HA data processing system should the disruptive activity disrupting the operation. A precautionary action is initiated with respect to the HA data processing system. | 11-03-2011 |
20110271137 | UNIFIED FRAMEWORK FOR CONFIGURATION VALIDATION - A modular framework may be provided for configuration checks that enable a developer to classify and describe each check and then subsequently search for checks and integrate them with other checks. Each check may include a dependency on other checks to create a hierarchy. Additionally, multiple checks may be combined. The combination of checks may be used to check configuration of specific processes or systems. Each check unit and business configuration check may contain keywords, descriptions, and documentation to enable the checks to be subsequently searched and reused in different applications. Systems, methods, and articles of manufacture may be provided. | 11-03-2011 |
20110296228 | TOLERATING SOFT ERRORS BY SELECTIVE DUPLICATION - A method, system, and computer usable program product for tolerating soft errors by selective duplication are provided in the illustrative embodiments. An application executing in a data processing system, selects an instruction that has to be protected from soft errors. The instruction is marked for duplication such that the instruction is duplicated during execution of the instruction. The marked instruction is sent for execution to a hardware front end. | 12-01-2011 |
20110296229 | DECIMAL FLOATING-POINTING QUANTUM EXCEPTION DETECTION - A system and method for detecting decimal floating point data processing exceptions. A processor accepts at least one decimal floating point operand and performs a decimal floating point operation on the at least one decimal floating point operand to produce a decimal floating point result. A determination is made as to whether the decimal floating point result fails to maintain a preferred quantum. The preferred quantum indicates a value represented by a least significant digit of a significand of the decimal floating point result. An output is provided, in response to the determining that the decimal floating point result fails to maintain the preferred quantum, indicating an occurrence of a quantum exception. A maskable exception can be generated that is immediately trapped or later detected to control conditional processing. | 12-01-2011 |
20110302444 | Information processing apparatus and driver execution control method - An information processing apparatus includes a process monitor configured to monitor the status of processes executed in accordance with respective monitored driver programs which are to be monitored among driver programs associated with respective devices, an error processor configured to operate when a processing error is detected by the process monitor, to register, in a nonvolatile memory, driver information indicating the driver program with respect to which the error has been detected, and an execution controller configured to call and execute the driver programs, wherein when the information processing apparatus is started, the execution controller skips execution of the driver program indicated by the driver information registered in the nonvolatile memory. | 12-08-2011 |
20110320855 | ERROR DETECTION AND RECOVERY IN A SHARED PIPELINE - A pipelined processing device includes: a processor configured to receive a request to perform an operation; a plurality of processing controllers configured to receive at least one instruction associated with the operation, each of the plurality of processing controllers including a memory to store at least one instruction therein; a pipeline processor configured to receive and process the at least one instruction, the pipeline processor including shared error detection logic configured to detect a parity error in the at least one instruction as the at least one instruction is processed in a pipeline and generate an error signal; and a pipeline bus connected to each of the plurality of processing controllers and configured to communicate the error signal from the error detection logic. | 12-29-2011 |
20110320856 | METHOD AND APPARATUS FOR SELECTIVE READING OF SYSTEM INFORMATION IN A MOBILE WIRELESS DEVICE - A method to read selectively system information messages in a mobile wireless communication device. The mobile wireless device receives a first transmission of a multiple segment message through a radio frequency receiver. The mobile wireless device detects decoding errors in at least one of the received segments of the first transmission. In response to detecting decoding errors, the mobile wireless device selectively receives a first subset of segments in a second transmission of the multiple segment message. The mobile wireless device powers down at least a portion of the radio frequency receiver during receive time intervals for a second subset of segments in the second transmission. The first subset of segments in the second transmission corresponds to segments in the first transmission received with decoding errors. The second subset of segments in the second transmission corresponds to segments in the first transmission received without decoding errors. | 12-29-2011 |
20110320857 | BOTTOM-UP MULTILAYER NETWORK RECOVERY METHOD BASED ON ROOT-CAUSE ANALYSIS - A bottom-up (or upward) multilayer network recovery method and apparatus based on a root-cause analysis are disclosed to quickly and accurately perform a recovery operation. The bottom-up multilayer network recovery method based on a root-cause analysis includes: simultaneously counting, by a fault detection layer, a root-cause analysis (RA) time and a hold-off (HO) time, upon detecting an occurrence of a fault; performing, by the fault detection layer, a root-cause analysis during the RA time to recognize a layer in which a root-cause has occurred; when the root-cause has occurred in the fault detection layer, immediately recovering the fault by the fault detection layer, and when a root-cause has occurrence in a lower layer, waiting for the HO time until such time as the lower layer can recover the fault; and when the fault has not been recovered even after the HO time has lapsed, recovering the fault by the fault detection layer. | 12-29-2011 |
20120005519 | SYSTEM AND METHOD FOR PROVIDING COLLABORATIVE MASTER DATA PROCESSES - A system and method for providing collaborative master data process management. A master data store comprises data for at least one data domain. A master data management module is configured to provide access to the data to one or more applications. A master data management service module provides at least one service providing access to the data based on a service-oriented architecture. A business process management module is configured to generate, execute and manage at least one business process related to the data domain. The at least one business process uses at least one of the at least one service provided by the master data management service module. A master data process module is configured to generate at least one data process comprising a business process involving an operation on the data. | 01-05-2012 |
20120017110 | FAULT-TOLERANCE AND RESOURCE MANAGEMENT IN A NETWORK - A method including receiving network topology and resource management information; generating a mapping between the network topology of a network and resource reservation paths associated with flows using the network based on the network topology and resource management information; generating a failure recovery plan (FRP) based on the mapping, wherein the FRP instructs one or more other network devices on how to manage a failure such that one or more resource reservation paths associated with flows impacted by the failure are not deleted; and loading the FRP on the one or more other network devices. | 01-19-2012 |
20120023359 | METHOD, APPARATUS AND COMPUTER PROGRAM FOR PROCESSING INVALID DATA - A method, system and computer program for processing invalid data. Data is received at a shared component for processing. A shared component is a component that is capable of being shared by multiple entities. The shared component has a plurality of threads. An attempt is made to process the data using one of the threads from the plurality of threads. The data is invalid and therefore the attempt at processing the invalid data results in the shared component and its plurality of threads failing. In response to the failure of the shared component, at least two instances of the shared component are created. At least one thread is assigned to each component instance, where the number of threads assigned to each component instance is restricted to a maximum number that is less than the original number of the plurality of threads. | 01-26-2012 |
20120030501 | AUTOMATIC DETERMINATION OF SUCCESS OF USING A COMPUTERIZED DECISION SUPPORT SYSTEM - Methods and systems are provided for improving the repair efficacy of a repair action using inferred feedback. The method comprises downloading a repair procedure, which has a probability of success for correcting the fault code. Repair action data is input into to the computing device and is tracked and correlated with the downloaded procedure. The method then adjusts a probability of success of the repair procedure in clearing the fault code generated by the complex system based at least on the correlation. The system comprises a means for receiving repair data, a means for tracking repair action data taken, a means for correlating the tracked repair action and the repair data, and a means for updating a probability of success of the repair action based at least in part on the correlation of the repair data, the repair action data and the operating status of the complex system. | 02-02-2012 |
20120042194 | Data Integrity Methods for Quantum Computational Plasmonic Information Representation and Processing Systems - Data integrity methods are disclosed for quantum computational plasmonic information representation and processing systems. Also disclosed are methods of saving energy in such applications. Also disclosed are methods of monitoring such applications. | 02-16-2012 |
20120047391 | SYSTEMS AND METHODS FOR AUTOMATED SUPPORT FOR REPAIRING INPUT MODEL ERRORS - Systems and associated methods for automated repair support for input model faults are described. Embodiments automate generation of fault repair support by producing one or more repair action suggestions for a given input model containing faults. Responsive to an indication of one or more faults within the model, embodiments utilize a fault index to ascertain the nature of faults within the model and to compile one or more repair action suggestions. Users can review the repair action suggestions, and preview the impact each of these suggestions will have on the model if implemented, and select an appropriate repair action for repairing a model containing faults. | 02-23-2012 |
20120066540 | INFORMATION CORRECTION SUPPORT SYSTEM AND METHOD - An information correction support system includes a first information providing unit to provide first information that is input by a first user to a second user, a first information accepting unit to accept an error entry position in the first information and second information that is correct information for the input error entry that are input by the second user, an error entry position providing unit to provide the accepted error entry position to the first user without providing the accepted second information, a second information accepting unit to accept third information that is correction information for the error entry input by the first user, a correctness determination unit to determine whether the accepted third information is correctly input information based on the accepted second information, and a warning unit to warn the first user according to a determination by the correctness determination unit. | 03-15-2012 |
20120072762 | METHODS AND SYSTEMS FOR DYNAMICALLY MANAGING REQUESTS FOR COMPUTING CAPACITY - Embodiments of systems and methods are described for dynamically managing requests for computing capacity from a provider of computing resources. Illustratively, the computing resources may include program execution capabilities, data storage or management capabilities, network bandwidth, etc. The systems or methods automatically allocate computing resources for execution of one or more programs associated with the user. The systems and methods may enable the user to make changes to the allocated resources after execution of the one or more programs has started. | 03-22-2012 |
20120072763 | SYSTEM AND METHOD OF FILE LOCKING IN A NETWORK FILE SYSTEM FEDERATED NAMESPACE - A method, system and apparatus of a file locking within a network file system federated namespace is disclosed. In one embodiment, a method includes accessing a target file in a storage medium over a network through an intermediate proxy server using a processor. The storage medium may be any one storage medium of a group of storage mediums on the network forming a data sharing cluster. In addition, the method includes locking the target file in the storage medium through a lock protocol to enable an access to modify the target file to at most one user at any given time, via the intermediate proxy server. | 03-22-2012 |
20120072764 | SYSTEMS AND METHODS FOR NETWORK INFORMATION COLLECTION - A network device may include logic configured to receive a problem report from a second network device, store and analyze data included in the problem report, filter data in the problem report to determine when the problem report is to be transmitted to a third network device, and transmit the problem report to the third network device when the filtering determines that the problem report is to be transmitted. | 03-22-2012 |
20120084595 | OPTIMIZED RECOVERY - A method, article of manufacture, and apparatus for restoring data. In some embodiments, this includes determining an object to be recovered, determining a representation of the object, and requesting the representation of the object from a data resource system. In some embodiments, the representation of the object is a hash value of the object. In some embodiments, the representation of the object is a segment of the object. | 04-05-2012 |
20120084596 | MONITORING CIRCUIT - A monitoring circuit monitors for the occurrence of a failure event on a data bus. The monitoring circuit includes a failure detection circuit for detecting the occurrence of the failure event within a device coupled to the data bus. An isolation circuit isolates the device from the data bus in response to the occurrence of the failure event. | 04-05-2012 |
20120089860 | METHODS, SYSTEMS, AND COMPUTER PROGRAM PRODUCTS FOR ANALYZING AN OCCURRENCE OF AN ERROR IN A COMPUTER PROGRAM BY RESTRICTING ACCESS TO DATA IDENTIFIED AS BEING SENSITIVE INFORMATION - A method of analyzing an occurrence of an error in a computer program executing on a data processing system includes receiving data that are associated with an execution leg of the computer program at the time of the error and restricting access to at least a portion of the data associated with the execution leg of the computer program based on an identification of the portion of the data associated with the execution leg of the computer program as being sensitive information. | 04-12-2012 |
20120089861 | INTER-PROCESSOR FAILURE DETECTION AND RECOVERY - An approach to detecting processor failure in a multi-processor environment is disclosed. The approach may include having each CPU in the system responsible for monitoring another CPU in the system. A CPU | 04-12-2012 |
20120089862 | DETERMINING RECOVERY TIME FOR INTERDEPENDENT RESOURCES IN HETEROGENEOUS COMPUTING ENVIRONMENT - Provided are techniques for determining a recovery time for a resource in a heterogeneous computing environment comprising interdependent resources. A graph for the resource representing all sequence dependencies and all group relations are created. The recovery time may be a cumulative startup time or a cumulative shutdown time of the resource considering interdependencies of the resource to other resources. The recovery time for all support resources having sequence dependencies with the resource is calculated and each node representing the support resources are removed from the graph. Then the recovery time for all member resources left in the graph that have group relations with the resource is calculated per a group type of the resource. The recovery time for the resource is a sum of the recovery time of all support resources, the recovery time of all member resources, and a unit recovery time of the resource. | 04-12-2012 |
20120110369 | Data Recovery in a Cross Domain Environment - A method for recovering data when corrupted data from a source is detected includes identifying data corrupted as a result of using the corrupted data by tracing propagation of the corrupted data to provide identified corrupted data, and repairing the identified corrupted data to provide repaired data. The propagation of the corrupted data is traced from one domain to another. Data in both domains is repaired. A wrapper is provided for the source. Calls into and out of the source are intercepted by the wrapper. Calls of a plurality of different domains are intercepted by the wrapper. A wrapper is provided for a process. External service calls of the process are intercepted by the wrapper. The wrapper recreates a process flow followed by the process in accordance with the corrupted data. A wrapper is provided for a database. Accesses of the database are intercepted by the wrapper. | 05-03-2012 |
20120110370 | HIGHLY AVAILABLE FILE SYSTEM IN A DIRECTLY ATTACHED STORAGE - A method and system to provide a highly available file system in a directly attached storage (DAS). The storage is directly attached to a computer system that has an inactive operating system. A hardware module in the computer system receives a network command to access the file system. The hardware module determines a physical location of data blocks to be accessed in the storage. According to the network command, the hardware module accesses the data blocks in the storage. | 05-03-2012 |
20120124410 | SYSTEM AND METHOD FOR SELF-HEALING - Provided are a system and a method for self-healing in a critical system. The present invention monitors a current situation of the critical system, determines whether a system has an error by analyzing the monitoring result, judges whether to perform self-healing in a current state or drive safety software which provides a minimum basic service according to self-healing of the system error or not when the system error occurs, and evaluates self-healing performance after healing the system error. According to exemplary embodiments of the present invention, it is possible to continuously provide a software service and further improve the reliability of the self-healing system through the evaluation of the self-healing performance. | 05-17-2012 |
20120124411 | SYSTEM ON CHIP FAULT DETECTION - The invention relates to a method for fault identification in a System-on-Chip (SoC) consisting of a number of IP cores, wherein each IP core is a fault containment unit, and where the IP cores communicate with one another by means of messages via a Network-on-Chip, and wherein an excellent IP core provides a TRM (Trusted Resource Monitor), wherein a faulty control message which is sent from one non-privileged IP core to another non-privileged IP core is identified and projected by an (independent) fault container unit, as a result of which this faulty control message cannot cause any failure of the message receiver. | 05-17-2012 |
20120131375 | Executing a Kernel Device Driver as a User Space Process - A method, including receiving, by a user space driver framework (UDF) library executing from a user space of a memory over a monolithic operating system kernel, a kernel application programming interface (API) call from a device driver executing from the user space. The UDF library then performs an operation corresponding to the kernel API call. | 05-24-2012 |
20120137162 | NETWORK DEVICE AND NETWORK CONNECTING METHOD FOR BUILDING UP NETWORK CONNECTION VIA HIGH DEFINITION MULTIMEDIA INTERFACE - A network device for building up a network connection via a high-definition multimedia interface, includes a scrambler, a descrambler, a comparator and a control unit. The scrambler is utilized for generating a transmission signal according to a first seed. The descrambler is for decoding a receiving signal to generate a second seed. The comparator is for generating a comparing result according to the first seed and the second seed. The control unit is for controlling the network connection according to the comparing result. | 05-31-2012 |
20120144226 | METHOD AND APPARATUS FOR SESSION ESTABLISHMENT MANAGEMENT - A method, computer readable medium and apparatus for performing session establishment management. For example, the method detects an evolved packet system establishment success rate that is measured over a predefined period of time falling below a predefined threshold, and performs, via a rule management server, an analysis on a bearer portion. The method then associates, via the rule management server, a root cause that contributed to the evolved packet system establishment success rate falling below the predefined threshold. | 06-07-2012 |
20120144227 | AUTOMATIC CORRECTION OF PROGRAM LOGIC - An approach to detection and repair of application level semantic errors in deployed software includes inferring aspects of correct operation of a program. For instance, a suite of examples of operations that are known or assumed to be correct are used to infer correct operation. Further operation of the program can be compared to results found during correct operation and the logic of the program can be augmented to ensure that aspects of further examples of operation of the program are sufficiently similar to the examples in the correct suite. In some examples, the similarity is based on identifying invariants that are satisfied at certain points in the program execution, and augmenting (e.g., “patching”) the logic includes adding tests to confirm that the invariants are satisfied in the new examples. In some examples, the logic invokes an automatic or semi-automatic error handling procedure if the test is not satisfied. Augmenting the logic in this way may prevent malicious parties from exploiting the semantic errors, and may prevent failures in execution of the programs that may have been avoided. | 06-07-2012 |
20120166864 | SYSTEM AND METHOD FOR DETECTING ERRORS OCCURRING IN COMPUTING DEVICE - A system and method detects errors occurring in a computing device. The computing device includes a central processing unit (CPU) and a memory. The method sets an interruption tag for the computing device and initializes the interruption tag as zero, and detects a general purpose input output (GPIO) signal output from the CPU through a GPIO interface. The method further determines whether the GPIO signal is in a first voltage level at every time interval, and adds one to the interruption tag when the GPIO signal is switched from the first voltage level to a second voltage level. In addition, the method determines that inter errors occur in the CPU if the interruption tag is equal to one, and determines that multi-bit errors occur in the memory if the interruption tag is greater than one. | 06-28-2012 |
20120210158 | Anomaly Detection to Implement Security Protection of a Control System - An anomaly detection mechanism is provided that detects an anomaly in a control network, and includes an identifying unit to receive event information on an event that occurs, and to identify a group including a resource related to the event information by referring to a configuration management database for retaining dependence relationships between processes and resources including a control system; a policy storing unit to store one or more policies each of which associates one or more actions with a condition defining a situation suspected to have an anomaly; an adding unit to acquire group-related information needed for application to the one or more policies, and to add the acquired information to the event information; and a determining unit to apply the event information to the one or more policies and to determine the one or more actions associated with the matched condition as one or more actions to be taken. | 08-16-2012 |
20120216067 | Data processing apparatus and method using monitoring circuitry to control operating parameters - A data processing apparatus and method are provided that use monitoring circuitry to control operating parameters of the data processing apparatus. The data processing apparatus has functional circuitry for performing data processing, the functional circuitry including error correction circuitry configured to detect errors in operation of the functional circuitry and to repair those errors in operation. Tuneable monitoring circuitry monitors a characteristic indicative of changes in signal propagation delay within the functional circuitry and produces a control signal dependent on the monitored characteristic. In a continuous tuning mode operation, the tuneable monitoring circuitry modifies the dependency between the monitored characteristic and the control signal in dependence upon certain characteristics of the errors detected by the error correction circuitry. An operating parameter controller is then arranged, in the continuous mode of operation, to control one or more performance controlling operating parameters of the data processing apparatus in dependence upon the control signal. This enables efficient and robust control of those operating parameters in response to changes in environmental conditions. | 08-23-2012 |
20120216068 | APPLICATION RELIABILITY AND FAULT TOLERANT CHIP CONFIGURATIONS - An application can specify reliability values via a communication path between the application and the registers. Application reliability could increase if the application itself could specify the timeout and retry values. For instance, some errors might be prevented if the timeout value is lengthened by a short amount. A longer timeout value would result in slower performance because the memory component could not be accessed during the timeout period. However, resolving errors in memory devices would prevent unrecoverable error indicators from being returned to the application, which would in turn limit application and system crashes. Creating a communication path between the application and the hardware registers would allow the application to modify the reliability of memory operations | 08-23-2012 |
20120221884 | ERROR MANAGEMENT ACROSS HARDWARE AND SOFTWARE LAYERS - Generally, this disclosure provides error management across hardware and software layers to enable hardware and software to deliver reliable operation in the face of errors and hardware variation due to aging, manufacturing tolerances, etc. In one embodiment, an error management module is provided that gathers information from the hardware and software layers, and detects and diagnoses errors. A hardware or software recovery technique may be selected to provide efficient operation, and, in some embodiments, the hardware device may be reconfigured to prevent future errors and to permit the hardware device to operate despite a permanent error. | 08-30-2012 |
20120239964 | FAILOVER SCHEME WITH SERVICE-BASED SEGREGATION - A system provides a set of services. The system includes nodes that are in communication with each other. The system segregates the services into at least first and second groups of services, assigns the first group of services to a first set of the nodes, and assigns the second group of services to a second set of nodes. The first set of nodes provides the first group of services, and the second set of nodes provides the second group of services. | 09-20-2012 |
20120246507 | PARALLEL MEMORY ERROR DETECTION AND CORRECTION - A system implementing parallel memory error detection and correction divides data having a word length of K bits into multiple N-bit portions. The system has a separate error processing subsystem for each of the N-bit portions, and utilizes each error processing subsystem to process the associated N-bit portion of the K-bit input data. During memory write operations, each error processing subsystem generates parity information for the N-bit data, and writes the N-bit data and parity information into a separate memory array that corresponds to the error processing subsystem. During memory read operations, each error processing subsystem reads N-bits of data and the associated parity information. If, based on the parity information, an error is detected from the N-bit data, the error processing subsystem attempts to correct the error. The corrected N-bit data from each of the error processing subsystems are combined to reproduce the K-bit word. | 09-27-2012 |
20120260120 | Controller Election - A method of controller election includes, upon failure of a master controller within a team comprising a number of controllers, automatically promoting another of the number controllers to serve as an elected master controller and designating the elected master controller as a new master controller if it is determined that the failure of the master controller is not temporary. | 10-11-2012 |
20120260121 | SELECTING AN ALTERNATIVE PATH FOR AN INPUT/OUTPUT REQUEST - A first path for forwarding an I/O request from a host device to a disk in a disk array is identified. The first path includes two endpoints (a first initiator endpoint on the host device and a first target endpoint on the disk array) separated by a storage area network. In response to an indication that the first path is non-functional, a second path to the disk for the I/O request is identified as an alternative to the first path. The second path includes a second initiator endpoint and a second target endpoint and is identified by selecting a path from among those paths that have at least one endpoint that is different from the two endpoints of the first path. | 10-11-2012 |
20120266012 | METHOD AND SYSTEM FOR RECOVERY OF A COMPUTING ENVIRONMENT DURING PRE-BOOT AND RUNTIME PHASES - A method and system for recovery of a computing environment includes monitoring during a pre-boot phase and a runtime phase of a computing device for selection of a hot key sequence by a user and performing a recovery action in response to the selection of the hot key sequence by the user. The recovery action may be any one of a number of predetermined and/or selectable actions such as restoring system defaults, migrating memory, displaying a menu of options, setting various software flags, restarting or rebooting the computing device, and/or the like. | 10-18-2012 |
20120297234 | CONCURRENT MANAGEMENT CONSOLE OPERATIONS - A setup module organizes a single software image for a management command. A process module creates a plurality of processes independently executing the management command on each of the plurality of devices from a management console. Each process employs the software image. A termination module ends the management command after each process has completed on each of the plurality of devices. | 11-22-2012 |
20120297235 | ERROR DETERMINATION DEVICE AND ERROR DETERMINATION METHOD OF CONTROL SYSTEM - When a determination is made that a signal transmitted by a voltage sensor, a second voltage sensor, a current sensor, a temperature sensor, a second temperature sensor, a first CPU, a second CPU and a communication circuit is in error, a third CPU of a motor generator ECU determines that the control system is in error. When a determination is made that the control system is in error, the third CPU determines whether each of the voltage sensors, the current sensor, the temperature sensors, the first CPU, the second CPU and the communication circuit is in error or not. | 11-22-2012 |
20120311373 | SIDEBAND ERROR SIGNALING - Fast error reporting is provided in networks that have an architected delayed error reporting capability. Errors are detected and reported without having to wait for a timeout period to expire. Further, failures of other components caused by the delay are avoided, since the delay is bypassed. | 12-06-2012 |
20120311374 | METHOD AND SYSTEM FOR CONTROLLING A SUPPLY VOLTAGE - Method and computing system for controlling a supply voltage in the computing system. A voltage related indication for use in setting the supply voltage of the computing system is stored, and a supply voltage is set for the computing system based on the stored voltage related indication. A crash of the computing system is detected, and in dependence thereon, an adjusted indication is determined for use in the computing system. An adjusted supply voltage is set based on the adjusted indication, and the adjusted indication is stored for further use of the computing system. | 12-06-2012 |
20120324271 | FAULT PROCESSING SYSTEM - Aspects of the invention provide for a fault processing system. In one embodiment, the fault processing system includes: a first processing engine wrapper having: an inbound pipe configured to obtain a first claimcheck data packet; a processing engine component configured to: process a first context message derived from the first claimcheck data packet according to a fault rule selected from: a fault detection rule, a fault location rule, a fault isolation rule, or a fault restoration rule; and generate a second context message, the second context message including data processed according to the selected fault rule; and an outbound pipe configured to provide a second claimcheck data packet derived from the second context message. | 12-20-2012 |
20120331332 | Restarting Event And Alert Analysis After A Shutdown In A Distributed Processing System - Methods, systems, and computer program products for restarting event and alert analysis after a shutdown in a distributed processing system are provided. Embodiments include identifying, by an incident analyzer, a shutdown condition of the distributed processing system, the incident analyzer including a plurality of event analyzers and a monitor that monitors the plurality of event analyzers; and determining, by the incident analyzer, whether the shutdown was a planned shutdown or an unplanned shutdown; if the shutdown was planned, storing, by the incident analyzer, an identification of the last event in an event log that was injected in an event queue at the time of the planned shutdown and restarting, by the incident analyzer, event and alert analysis using the next event identified in the event log; and if the shutdown was unplanned, for each event analyzer, identifying the last event included in the last event pool that the event analyzer closed; and restarting, by the incident analyzer, event and alert analysis at the event analyzer using the next event received by the event analyzer after the identified last event. | 12-27-2012 |
20120331333 | Stream Data Processing Failure Recovery Method and Device - In a duplex configuration of stream data processing, all window operations can be used without stopping the process when adding a standby system. The time when the standby system server is added is stored as the reproduction time, and data copied from the data generated at or after the reproduction time is transmitted to the standby system. While the data processing in the in-use system is continued, changes in the execution state which occur in operators holding execution state at or after the reproduction time are recorded. The execution states are copied to the standby system for each operator in parallel with the data processing. At this time, the execution states of the operators at the reproduction time are reproduced from the execution states of the operators when the copy is performed and the record of the changes of the execution states, and the reproduced execution states are copied. When the execution states of all the operators have been copied, the standby system starts processing of the copied data which are generated at or after the reproduction time. | 12-27-2012 |
20130007501 | SYSTEM AND METHOD FOR IDENTIFYING REPAIR POINTS AND PROVIDING EFFECTIVE DISPATCH - Systems and methods are disclosed for identifying repair points reported by individuals at one or more locations in a geographic area, and providing an effective process for dispatching repair crews. Event data related to a repair point is provided, where the event data is processed to determine the type(s) of repair needed, times and locations of reported events, as well as relationships between events and priority. This information is then used to prioritize dispatch to repair crews to address the events, and further monitor repair crews to determine status of repairs, and to further update priorities relating to ongoing repairs. | 01-03-2013 |
20130013953 | HEALTH MONITORING OF APPLICATIONS IN A GUEST PARTITION - A health monitoring technique monitors the health and performance of applications executing in a guest partition in a virtualized environment. In an embodiment, a guest integration component interacts with an application through an application programming interface in order for the virtualization platform to monitor the health and performance of the application. In another embodiment, the guest integration component may include a monitoring agent that accesses an event log and/or a performance monitor log to access the health and performance of the application. The health and performance of the application may then be analyzed by the virtualization platform to determine an appropriate remedial action. | 01-10-2013 |
20130024718 | Multiple Node/Virtual Input/Output (I/O) Server (VIOS) Failure Recovery in Clustered Partition Mobility - A method utilizes cluster-awareness to effectively support a live partition mobility (LPM) event and provide recovery from node failure within a Virtual Input/Output (I/O) Server (VIOS) cluster. An LPM utility creates a monitoring thread on a first VIOS on initiation of a corresponding LPM event. The monitoring thread tracks a status of an LPM and records status information in the mobility table of a database. The LPM utility creates other monitoring threads on other VIOSes running on the (same) source server. If the first VIOS VIOS sustains one of multiple failures, the LPM utility provides notification to other functioning nodes/VIOSes. The LPM utility enables a functioning monitoring thread to update the LPM status. In particular, a last monitoring thread may perform cleanup/update operations within the database based on an indication that there are nodes on the first server that are in failed state. | 01-24-2013 |
20130042139 | SYSTEMS AND METHODS FOR FAULT RECOVERY IN MULTI-TIER APPLICATIONS - A computer-implemented method for fault recovery in multi-tier applications may include: 1) identifying a plurality of clusters, 2) identifying a multi-tier application that includes a plurality of components, each cluster within the plurality of clusters hosting a component, 3) identifying a fault of a first component within the plurality of components on a first cluster within the plurality of clusters, the fault requiring a first recovery action, 4) identifying at least one dependency relationship involving the first component and a second component within the plurality of components on a second cluster within the plurality of clusters, 5) determining, based on the fault and the dependency relationship, that the second component requires a second recovery action to ensure that the multi-tier application operates correctly, and 6) performing the second recovery action on the second component. Various other methods, systems, and computer-readable media are also disclosed. | 02-14-2013 |
20130055008 | DOWNLOADING A DISK IMAGE FROM A SERVER WITH A REDUCED CORRUPTION WINDOW - Example embodiments relate to downloading a disk image from a server while reducing the corruption window. In example embodiments, a computing device writes a recovery image to a portion of a primary storage device. The computing device may then write the disk image to the primary storage device until a portion of the disk image corresponding to the recovery image remains. Next, the computing device may write the remaining portion of the disk image to a secondary storage location. Finally, the computing device may overwrite the recovery image using the remaining portion of the disk image from the secondary storage location. | 02-28-2013 |
20130061085 | SYSTEM AND METHOD FOR MANAGING A NETWORK INFRASTRUCTURE USING A MOBILE DEVICE - A system and method for managing an IT infrastructure using a mobile device, the method comprises identifying, using one or more processors of a network management system, an issue in one or more components in the infrastructure; retrieving a message instruction for the identified issue from an action database, wherein the message instruction includes information identifying a support personnel and a mobile device of the support personnel to contact regarding the identified issue; sending an alert message to the mobile device of the identified support personnel, wherein the alert message contains information of the identified issue; receiving, at the network management system, a reply message from the mobile device, wherein the reply message contains an instruction to resolve the identified issue; generating an executable command corresponding to the instruction in the reply message; and executing the executable command on the affected components in the infrastructure to resolve the identified issue. | 03-07-2013 |
20130103972 | Data processing apparatus and method for analysing transient faults occurring within storage elements of the data processing apparatus - A data processing apparatus has a plurality of storage elements residing at different physical locations within the apparatus, and fault history circuitry for detecting local transient faults occurring in each storage element, and for maintaining global transient fault history data based on the detected local transient faults. Analysis circuitry monitors the global transient fault history data to determine, based on predetermined criteria, whether the global transient fault history data is indicative of random transient faults occurring within the data processing apparatus, or is indicative of a coordinated transient fault attack. The analysis circuitry is then configured to initiate a countermeasure action on determination of a coordinated transient fault attack. This provides a simple and effective mechanism for distinguishing between random transient faults that may naturally occur, and a coordinated transient fault attack that may be initiated in an attempt to circumvent the security of the data processing apparatus. | 04-25-2013 |
20130103973 | SYSTEMS AND METHODS FOR PROVIDING HIERARCHY OF SUPPORT SERVICES VIA DESKTOP AND CENTRALIZED SERVICE - The present solution provides increases in automation, scalability and efficiency for delivering technical support services to devices. Systems and methods of the present solution provide a hierarchy or layers of automated desktop services with remote technical support services, which may be automated. The present solution provides an on desktop automation support system that detects and automatically remediates problems on a device of the user. If the problem is not fixed or fixable via local automated remediation at the desktop, a centralized service may remotely deliver technical support services to the device in the form of automated support services delivered to the device or remote technical agents connecting remotely with the device. With the combination of local support automation, remote support automation, remote and onsite technicians, the centralized service may deliver a hierarchy or multi-layers of services to any device. | 04-25-2013 |
20130111257 | System and Method for Provisioning and Running a Cross-Cloud Test Grid | 05-02-2013 |
20130117601 | IMPLEMENTING ULTRA HIGH AVAILABILITY PERSONALITY CARD - A method and circuit for implementing an enhanced availability personality card for a chassis computer system, and a design structure on which the subject circuit resides are provided. The personality card includes a first erasable programmable read only memory (EPROM) and a second EPROM, each EPROM storing Vital Product Data (VPD) and a first temperature sensor and a second temperature sensor sensing temperature. A primary bidirectional bus and a redundant bidirectional bus are respectively connected between the first EPROM and the first temperature sensor and the second EPROM and the second temperature sensor, and a pair of chassis management modules. Each chassis management module includes a switch connected to both the primary bidirectional bus and the redundant bidirectional bus providing redundant paths, enabling continued function with failure of any critical personality card component. | 05-09-2013 |
20130124908 | SYSTEMS AND METHODS FOR AUTOMATIC REPLACEMENT AND REPAIR OF COMMUNICATIONS NETWORK DEVICES - Systems and methods for automatic repair, replacement, and/or configuration of various network devices within a communications network are disclosed. The system may receive indication of a failed network device and automatically perform diagnostic on the network device to determine any problems associated with the hardware and/or software components within the network device. Subsequently one or more repair, replacement, and/or configuration procedures may be automatically initiated in an attempt to resolve the problems and restore the failed network device. | 05-16-2013 |
20130138993 | VOLTAGE CONTROL - An apparatus for controlling a supply voltage to an electronic processing arrangement comprising a processor or a memory element, the apparatus being configured to receive an output of the electronic processing arrangement and comprising: error detection means for detecting errors in an output of the electronic processing arrangement; and means for adaptively varying the supply voltage to the electronic processing arrangement based on an analysis of errors detected in the output of the electronic processing arrangement. The apparatus may further comprise means for correcting errors detected in the output of the electronic processing arrangement. | 05-30-2013 |
20130138994 | Preventing Disturbance Induced Failure in A Computer System - A method to prevent failure on a server computer due to internally and/or externally induced shock and/or vibration. The method includes acquiring, by at least one sensor, analog acceleration data of components in a server computer. The data is then converted to digital format and stored within a motor drive assembly processor memory unit. The processor analyzes the stored data for existence of machine degradation. In response to detecting the existence of machine degradation, the motor drive assembly processor initiates remediation procedures. The remediation procedures include controlling rotating speed of moving devices or performing a complete system shut down. | 05-30-2013 |
20130145202 | Handling Virtual-to-Physical Address Translation Failures - A method tolerates virtual to physical address translation failures. A translation request is sent from a graphics processing device to a translation mechanism. The translation request is associated with a first wavefront. A fault notification is received within an accelerated processing device (APD) from the translation mechanism that a request cannot be acknowledged. The first wavefront is, stored within a shader core of the APD if the fault notification is received. The first wavefront is replaced with a second wavefront if the fault notification is received, the second wavefront being ready to be executed. | 06-06-2013 |
20130185586 | SELF-HEALING OF NETWORK SERVICE MODULES - Methods, systems, and devices are described for managing virtual network services provided to a network. A number of processors in a self-contained network services module may execute a number of separate network service application instances associated with providing network services to the network. State information for each network service application instance may be stored within a shared memory, and a fault in one of the network service application instances may be identified based on the stored state information. The identified fault may be dynamically remedied in the one of the network service application instances. | 07-18-2013 |
20130185587 | Controlling a Solid State Disk (SSD) Device - A mechanism is provided for controlling a solid state disk. A failure detector detects a failure in the solid state disk. Responsive to failure detector detecting a failure, a status degrader sets a degraded status indicator for the solid state disk. Responsive to the degraded status indicator, a degraded status controller maintains the solid state disk in operation in a degraded operation mode. | 07-18-2013 |
20130191680 | HANDLING OF MESSAGES IN A MESSAGE SYSTEM - A messaging system comprises a message source, a message receiver and a message service. The message service is intermediate of the message source and message receiver, and a compensation component is established at the message source. A one way message is transmitted from the message source, where the one-way message is part of a plurality of one way messages of an overall business transaction. The message is received at the message service and is transmitted to the message receiver, which processes the received message. The message receiver transmits a communication indicating success or failure of the processing of the message. The system causes compensation logic defined by a compensation component to execute responsive to receiving an indication of a failure of part of the overall business transaction despite the communication from the message receiver indicated that processing of the particular one way message succeeded. | 07-25-2013 |
20130191681 | SYSTEMS, METHODS, AND APPARATUS FOR SIGNAL PROCESSING-BASED FAULT DETECTION, ISOLATION AND REMEDIATION - Certain embodiments of the invention may include systems, methods, and apparatus for signal processing-based fault detection, isolation and remediation. According to an example embodiment of the invention, a method is provided for detecting and remediating sensor signal faults. The method may include monitoring data received from one or more sensors; determining confidence values for one or more parameters associated with the one or more sensors based at least in part on the monitored data; determining a combined confidence for each of the one or more sensors; and outputting a remediated value and status based at least in part on the monitored data and the combined confidences. | 07-25-2013 |
20130198556 | SYSTEMS AND METHODS FOR CREATING A NEAR OPTIMAL MAINTENANCE PLAN - Methods and apparatus are provided for determining a lowest total cost maintenance plan. The method comprises receiving a sequence of maintenance actions in an order of a waiting time for each maintenance action, wherein one of the maintenance actions is likely to repair the failure mode. Each maintenance action has an associated cost equal to a waiting time cost, an execution time cost and a material cost, wherein the waiting time of each maintenance action is the time required to requisition and receive material required to perform the maintenance action. The method also constructs a maintenance plan comprising a primary requisition and a secondary requisition by assigning each of the sequence of maintenance actions to one of the primary and secondary requisition. | 08-01-2013 |
20130227332 | DATA SUMMARIZATION RECOVERY - Embodiments of the invention provide systems and methods for recovering a failed data summarization. According to one embodiment, recovering a failed instance can comprise processing existing summarization instances identified as instances for which a new data summarization instance needs to wait. Upon a completion or a timeout of each of the instances identified as instances for which the new data summarization instance needs to wait, an exclusive lock can be acquired on a table storing scope information for the plurality of data summarization instances. One or more existing data summarization instances that match the new data summarization instance or that have an overlapping scope with the new data summarization instance can be processed, remaining tasks to be performed by the new data summarization instance can be defined, the exclusive lock can be released, and the remaining tasks to be performed by the new data summarization instance can be performed. | 08-29-2013 |
20130227333 | FAULT MONITORING DEVICE, FAULT MONITORING METHOD, AND NON-TRANSITORY COMPUTER-READABLE RECORDING MEDIUM - A fault monitoring device includes: a controller that is implemented in a computer and controls the computer; a monitored object operated by the computer; a monitor that monitors a fault of the controller and a fault of the monitored object; and a switcher that alternately switches a monitored target by the monitor. | 08-29-2013 |
20130246837 | SYSTEM AND METHOD FOR MITIGATING REPEATED CRASHES OF AN APPLICATION RESULTING FROM SUPPLEMENTAL CODE - Provided is a method for mitigating the effects of an application which crashes as the result of supplemental code (e.g., plug-in), particularly a plug-in from a source other than the source of the operating system of the device or the source of the application that crashes. The method includes executing the application. As the application is running, it may be monitored to determine if normal execution of instructions ceases. When that occurs, the system will make a determination if code from a supplemental code module was the cause of the crash, and will make an evaluation if that supplemental code module is from a source other than the source(s) of the operating system and application in question. In some implementations, remedial steps may be provided, such as providing information on subsequent executions of the application. | 09-19-2013 |
20130275800 | Systems and Methods for Providing Fault Detection and Management - Methods and systems for providing fault detection and management are disclosed. A system includes a web-based interface that allows a user to access all elements of a customer service network, which spans multiple networks, departments, and external partners. The system, and thereby the user, is able to manage almost all aspects of the network, thereby giving the user end-to-end customer experience issue management. Real time and archived events are utilized, in some embodiments, for root cause analysis and/or process and/or performance improvement. Events from differing transport, platform, technology and OSI model levels are correlated for optimal customer experience monitoring alarming and analysis. | 10-17-2013 |
20130283086 | MONITORING AND RESOLVING DEADLOCKS, CONTENTION, RUNAWAY CPU AND OTHER VIRTUAL MACHINE PRODUCTION ISSUES - Resolving virtual machine (VM) issues, by executing VM and operating system (OS) diagnostic monitors, including, monitoring a set of VM and OS health status metrics of a system at a first level, analyzing data of the monitored health status metrics to determine that an instability has occurred when the data exceeds defined bounds for the health status metrics, responding to the instability by monitoring additional VM and OS health status metrics, whereby a level of monitoring of the system is increased from the first level to a second level, greater than the first level, identifying the instability, repairing the system by taking corrective action based on the identified instability; and removing at least one of the set of monitoring and profiling tools to reduce the level of monitoring to a third level once the instability has been resolved, wherein the third level is less than the second level. | 10-24-2013 |
20130283087 | Automated Fault and Recovery System - A mechanism is provided for handling incidents occurring in a managed environment. An incident is detected in a resource in the managed environment. A set of incident handling actions are identified based on incident handling rules for an incident type of the incident. From the set of incident handling actions, one incident handling action is identified to be executed based on a set of impact indicators associated with the set of incident handling rules. The identified incident handling action is then executed to address the failure of the resource. | 10-24-2013 |
20130283088 | Automated Fault and Recovery System - A mechanism is provided for handling incidents occurring in a managed environment. An incident is detected in a resource in the managed environment. A set of incident handling actions are identified based on incident handling rules for an incident type of the incident. From the set of incident handling actions, one incident handling action is identified to be executed based on a set of impact indicators associated with the set of incident handling rules. The identified incident handling action is then executed to address the failure of the resource. | 10-24-2013 |
20130283089 | METHOD FOR FAULT HANDLING IN A DISTRIBUTED IT ENVIRONMENT - An improved method provides fault handling in a distributed IT environment. The distributed IT environment executes at least one workflow application interacting with at least one application by using interface information about the at least one application. The method comprises: storing at least one fault handling description in a implementation-independent meta language associated with the at least one application; associating the interface information with the at least one fault handling description based on at least one defined fault handling policy, created based on at least one service definition; and the workflow application if a fault response from the at least one application is received: retrieving at least one associated fault handling description based on at least one fault handling policy, and interpreting and executing a particular meta language code of the at least one associated fault handling description in order to continue the defined workflow application. | 10-24-2013 |
20130283090 | MONITORING AND RESOLVING DEADLOCKS, CONTENTION, RUNAWAY CPU AND OTHER VIRTUAL MACHINE PRODUCTION ISSUES - Resolving virtual machine (VM) issues, by executing VM and operating system (OS) diagnostic monitors, including, monitoring a set of VM and OS health status metrics of a system at a first level, analyzing data of the monitored health status metrics to determine that an instability has occurred when the data exceeds defined bounds for the health status metrics, responding to the instability by monitoring additional VM and OS health status metrics, whereby a level of monitoring of the system is increased from the first level to a second level, greater than the first level, identifying the instability, repairing the system by taking corrective action based on the identified instability; and removing at least one of the set of monitoring and profiling tools to reduce the level of monitoring to a third level once the instability has been resolved, wherein the third level is less than the second level. | 10-24-2013 |
20130297964 | Virtual Machine Placement With Automatic Deployment Error Recovery - Embodiments perform automatic selection of hosts and/or datastores for deployment of a plurality of virtual machines (VMs) while monitoring and recovering from errors during deployment. Resource constraints associated with the VMs are compared against resources or characteristics of available hosts and datastores. A VM placement engine selects an optimal set of hosts/datastores and initiates VM creation automatically or in response to administrator authorization. During deployment, available resources are monitored enabling dynamic improvement of the set of recommended hosts/datastores and automatic recovery from errors occurring during deployment. | 11-07-2013 |
20130305080 | Real-Time Event Storm Detection in a Cloud Environment - A method, an apparatus and an article of manufacture for detecting an event storm in a networked environment. The method includes receiving a plurality of events via a plurality of probes in a networked environment, each of the plurality of probes monitoring a monitored information technology (IT) element, aggregating the plurality of events received into an event set, and correlating the plurality of events in the event set to determine whether the plurality of events are part of an event storm by determining if the plurality of events in the event set meet one or more event storm criteria. | 11-14-2013 |
20130305081 | METHOD AND SYSTEM FOR DETECTING SYMPTOMS AND DETERMINING AN OPTIMAL REMEDY PATTERN FOR A FAULTY DEVICE - Computer-implemented systems, methods, and computer-readable media electronic for detecting symptoms and determining an optimal remedy pattern for one or more faulty components of a device is disclosed. First the symptoms of the faulty device are detected and associated faulty components of the device are identified. Different tests are performed to confirm the status of the faulty components. Based on the historical data, cost information and remedy cost function an optimal remedy pattern is determined. | 11-14-2013 |
20130311820 | FORECASTING WORKLOAD TRANSACTION RESPONSE TIME - Reliability testing can include determining a transaction time for each of a plurality of transactions to a system under test during the reliability test, wherein the plurality of transactions are of a same type. Forecasts of transaction times can be calculated for the transaction type. The forecasts can be compared with a threshold time using a processor. A remedial action can be implemented responsive to at least one of the forecasts exceeding the threshold time. | 11-21-2013 |
20130339779 | SYSTEMATIC FAILURE REMEDIATION - Aspects of the present invention provide a tool for analyzing and remediating an update-related failure. In an embodiment, a failure state of a computer system that has been arrived at as a result of an update is captured. A semantic diff that includes the difference between the failure state and at least one of an original state or a completion state is then computed. This semantic diff is transformed into a feature vector format. Then the transformed semantic diff is analyzed to determine a remediation for the update. Failure and/or resolution signatures can be constructed using the semantic diff and contextual data, and these signatures can be used in comparison and analysis of failures and resolutions. | 12-19-2013 |
20130339780 | COMPUTING DEVICE AND METHOD FOR PROCESSING SYSTEM EVENTS OF COMPUTING DEVICE - In a method for processing system events of a computing device, the computing device includes a basic input and output system (BIOS) and a baseboard management controller (BMC). The method allocates revised storage blocks in the BMC, for storing normal system events of the computing device, and a backup storage block in the BMC for storing error system events of the computing device. The method detects a error system event via the BMC, and records the error system event into the backup storage block of the BMC. The method obtains the error system event from the backup storage block of the BMC via the BIOS when the computing device is rebooted, and processes the error system event to reboot the computing device using a normal system event stored in the revised storage blocks of the BMC. | 12-19-2013 |
20130346786 | DYNAMIC ESCALATION OF SERVICE CONDITIONS - Systems, methods, and software are provided for dynamically escalating service conditions associated with data center failures. In one implementation, a monitoring system detects a service condition. The service condition may be indicative of a failure of at least one service element within a data center monitored by the monitoring system. The monitoring system determines whether or not the service condition qualifies for escalation based at least in part on an access condition associated with the data center. The access condition may be identified by at least another monitoring system that is located in a geographic region distinct from that of the first monitoring system. Upon determining that the service condition qualifies for escalation, the monitoring system escalates the service condition to an escalated condition and initiates an escalated response. | 12-26-2013 |
20140019795 | COMPUTER PRODUCT, COUNTERMEASURE SUPPORT APPARATUS, AND COUNTERMEASURE SUPPORT METHOD - A computer-readable recording medium stores a countermeasure support program that causes a computer to execute a process that includes calculating a time period elapsing from an occurrence timing of a message that is of a predetermined type and related to an operation of an apparatus in a monitored system, until an occurrence timing of a fault; and outputting the calculated elapsed time period. | 01-16-2014 |
20140025983 | INFORMATION PROCESSING APPARATUS AND METHOD FOR GENERATING PSEUDO FAILURE - A controller that obtains data from an object device in obedience to an obtaining request from the processor includes an error setter that sets, when a pseudo failure mode that spuriously generates a failure is active, an error associated with a failure type of a pseudo failure to be generated in the data obtained from the object device in obedience to the obtaining request; and an error processor that notifies, when detecting an error in the data under a state where the pseudo failure mode is active, the processor of the failure response corresponding to the failure type associated with the detected error. | 01-23-2014 |
20140032957 | SYNCHRONOUS MODE REPLICATION TO MULTIPLE CLUSTERS - Provided are a computer program product, system, and method for synchronous mode replication to multiple clusters receiving a write to a volume from a host. A received write is cached in a memory. A determination is made of a replication rule indicating one of a plurality of replication modes for a first cluster and a second cluster used for replication for the write, wherein one of the replication modes comprises a synchronous mode. A determination is made that the replication rule indicates a synchronous mode for the first and the second clusters. The write is transmitted from the memory to the first cluster to store in a first non-volatile storage of the first cluster and to the second cluster to store in a second non-volatile storage in response to determining that the replication rule indicates the synchronous mode. | 01-30-2014 |
20140040655 | TAPE DRIVE RETRY - The present disclosure provides techniques for operating a tape drive. A method of operating a tape drive includes monitoring a parameter of the tape drive during a data access operation. The method also includes detecting an access failure. The method further includes selecting a treatment based on the parameter, applying the treatment, and performing a retry. | 02-06-2014 |
20140068317 | PERIPHERAL DEVICE SHARING IN MULTI HOST COMPUTING SYSTEMS - The present subject matter discloses methods and systems of sharing of peripheral devices in multi host computing systems ( | 03-06-2014 |
20140082406 | DATA PROTECTION THROUGH POWER LOSS PREDICTION - A memory system may enact emergency activities, such as preventing a write abort, by identifying when a power loss occurs at the earliest time possible. The prediction of a power loss during the process of programming a page, but before all power is lost may allow for the memory to initiate emergency activities. A power loss prediction mechanism may utilize a data link lost signal to trigger data protection. The data link lost signal may indicate that the data connection between the memory and a host has been lost. The signal indicating a data link loss may precede the actual detection of a power loss so that data protection can be implemented quicker. | 03-20-2014 |
20140082407 | REMEDIATING EVENTS USING BEHAVIORS - Remediating events of components using behaviors via an administrator system and an administrator client. The administrator system receives an event from a component of an information technology (IT) environment. A behavior is determined at least partly from the event. The behavior is determined to be an anomalous behavior at least partly from a group of previously received events. A coefficient is calculated, via a calculation, for the anomalous behavior at least partly from a weight. The administrator system sends a description of the anomalous behavior and a group of options to the administrator client. The description is at least partly based on the calculation. The administrator system receives a severity indication from the administrator client. The weight, the calculation, and the description are updated based on the severity indication. | 03-20-2014 |
20140082408 | FAULT TOLERANT SYSTEM AND METHOD FOR PERFORMING FAULT TOLERANT - A primary virtual machine is formed on a primary machine in which a primary hypervisor runs, and inputs virtual interrupt based on an external interrupt from the primary hypervisor to a primary guest OS. A secondary virtual machine is formed on a secondary machine in which a secondary hypervisor runs, and inputs the virtual interrupt to a secondary guest OS on the basis of timing information on the virtual interrupt transmitted from the primary virtual machine. When inputting the virtual interrupt to the primary guest OS, the primary virtual machine suspends the primary guest OS, and determines whether the suspended position is in a critical section. If the suspended position is not in the critical section, the primary virtual machine inputs the virtual interrupt at the suspended position. If the suspended position is in the critical section, the primary virtual machine changes the suspended position, and again performs the determination. | 03-20-2014 |
20140095921 | INFORMATION PROCESSING APPARATUS, STARTUP PROGRAM, AND STARTUP METHOD - An information processing apparatus that performs a startup control of redundantly configured modules includes a memory to retain abnormality information regarding an abnormality that occurs at time of startup control of the modules, and a startup controller section executing a startup process by sequentially executing the process, generating the abnormality information, determining whether a reduced operation is possible or not when the module in which an abnormality occurs at the time of startup control is detected, completing an execution of the process block in progress when it is determined that the reduced operation is possible, executing a restart process on a module selected from all the modules in which abnormalities occur at the time of startup control based on the abnormality information and completing an execution of the process block in progress after completing the restart process when determined that the reduced operation is not possible. | 04-03-2014 |
20140108851 | Online Protection Coordination for Distribution Management System - A method for automatic protection coordination in a power system network comprises identifying radial source-to-load paths and fault protection devices in the source-to-load paths, for a portion of the power system network to be coordinated. Device settings data for fault protection devices are retrieved, including multiple preconfigured settings for some devices. Fault currents for each of multiple possible electrical faults in said portion of the power system network are predicted, and a selectivity check for each pair of fault protection devices that are adjacent to one another in an identified radial source-to-load path is performed, for each of one or more of the predicted fault currents, taking into account multiple preconfigured settings for remotely controllable fault protection devices. A combination of settings for remotely controllable fault protection devices that minimizes selectivity violations among the pairs is selected, and necessary change-setting commands are sent to remotely controllable fault protection devices. | 04-17-2014 |
20140115376 | INTEGRATED CIRCUIT WITH ERROR REPAIR AND FAULT TOLERANCE - An integrated circuit is provided with error detection circuitry and error repair circuitry. Error tolerance circuitry is responsive to a control parameter to selectively disable the error repair circuitry. The control parameter is dependent on the processing performed within the circuit. For example, the control parameter may be generated in dependence upon the program instruction being executed, the output signal value which is in error, the previous behavior of the circuit or in other ways. | 04-24-2014 |
20140115377 | INTEGRATED CIRCUIT WITH ERROR REPAIR AND FAULT TOLERANCE - An integrated circuit is provided with error detection circuitry and error repair circuitry. Error tolerance circuitry is responsive to a control parameter to selectively disable the error repair circuitry. The control parameter is dependent on the processing performed within the circuit. For example, the control parameter may be generated in dependence upon the program instruction being executed, the output signal value which is in error, the previous behavior of the circuit or in other ways. | 04-24-2014 |
20140129871 | FAIL SAFE CODE FUNCTIONALITY - Some aspects of the present disclosure provide for a system and method for fault mitigation of a non-volatile memory (NVM) store subject to error correction code (ECC) checking. A simple and robust means to test the integrity of failsafe code stored within the non-volatile memory prior to execution are disclosed. In some embodiments, the failsafe code comprises program elements to communicate the memory failure to other parts of the system, or to execute an orderly shutdown. In the event that an ECC error occurs, the failsafe code can be verified, and upon successful verification, executed. | 05-08-2014 |
20140129872 | ERROR CONTROL IN MEMORY STORAGE SYSTEMS - A method includes calculating a first syndrome of a codeword read from a memory location under a first set of conditions and calculating a second syndrome of the codeword read from the memory location under a second set of conditions. The method also includes analyzing the first and second syndromes and applying one of the first and second syndromes to the codeword to find the codeword having a minimum number of errors. | 05-08-2014 |
20140143588 | Instant Communication Error Indication From Slave - An apparatus comprises at least one processor and at least one memory including computer program code, the at least one memory and the computer program code configured, with the at least one processor, to cause the apparatus at least to perform at least commanding a slave node to activate an immediate error response mode; and receiving an instant response from the slave node in response to a communication error. | 05-22-2014 |
20140149781 | Method for Batch Execution of System Calls in an Operating System - A system and a method are disclosed for batch execution of system calls in an operating system. In one implementation, a processing device configures a system call batching buffer table in a user space of an operating system, the system call batching buffer table including a plurality of system call units, associates a system call number with the system call batching buffer table, and issues a trap instruction to a kernel of the operating system to execute at least one of the plurality of system call units, the trap instruction including the system call number. | 05-29-2014 |
20140157036 | ADVANCED AND AUTOMATIC ANALYSIS OF RECURRENT TEST FAILURES - In one embodiment, a test case run analyzer may filter out failure events with known causes from a test report. The test case run analyzer may receive a test report of a test case run of an application process. The test case run analyzer may automatically identify a failure event in the test case run. The test case run analyzer may automatically compare the failure event to a failure pattern set. The test case run analyzer may filter the test report based on the failure pattern set. | 06-05-2014 |
20140157037 | IDENTIFYING SOFTWARE RESPONSIBLE FOR CHANGES IN SYSTEM STABILITY - A computer program product includes computer usable program code for: detecting a stability change in a computer system; identifying a first set of at least one capability of the computer system that is affected by the stability change; identifying, in response to detecting the stability change, a software application that was installed or updated prior to the stability change; identifying a second set of at least one capability that is utilized by the identified software application; comparing the first set to the second set to determine a degree of similarity; comparing a first time that the stability change was detected to a second time that the identified software application was installed or updated to determine a temporal proximity; and identifying the likelihood that the identified software application is the cause of the stability change, wherein the identified likelihood is a function of the degree of similarity and the temporal proximity. | 06-05-2014 |
20140157038 | USING SEPARATE PROCESSES TO HANDLE SHORT-LIVED AND LONG-LIVED JOBS TO REDUCE FAILURE OF PROCESSES - A method, system and computer program product for reducing the failure of processes. After a job is received, a determination is made as whether the received job is a “short-lived job” or a “long-lived job.” A short-lived job refers to a job who accomplishes a given task in less than a threshold period of time. A long-lived job refers to a job who accomplishes a given task in greater than a threshold period of time. For an identified long-lived job, the long-lived job is executed on a single process apart from other processes; whereas, the short-lived job is executed on at least one process separate from the processes executing long-lived jobs. As a result of executing the long-lived jobs on separate processes from the short-lived jobs, the likelihood of having a process fail is lessened since the duration of time that the process is running will be lessened. | 06-05-2014 |
20140157039 | USING DATA WATCHPOINTS TO DETECT UNITIALIZED MEMORY READS - A method of detecting uninitialized memory reads is shown where either all or a subset of a random access memory system is initialized to a know value. One or more watch points are implemented where after a memory read is detected the value read is compared to the value written during initialization. If the values match debug information is captured and appropriate corrective action is taken. | 06-05-2014 |
20140157040 | IDENTIFYING SOFTWARE RESPONSIBLE FOR CHANGES IN SYSTEM STABILITY - A computer-implemented method detects a stability change in a computer system, and identifies a first set of at least one capability of the computer system that is affected by the stability change. In response to detecting the stability change, the method identifies a software application that was installed prior to the stability change, and identifies a second set of at least one capability of the computer system that is utilized by the identified software application. The method compares the first and second capability sets to determine a degree of similarity, and compares the time that the stability change was detected to the time that the identified software application was installed to determine a temporal proximity. The method then identifies the likelihood that the identified software application is the cause of the stability change, wherein the identified likelihood is a function of the degree of similarity and the temporal proximity. | 06-05-2014 |
20140173326 | Write Performance in Fault-Tolerant Clustered Storage Systems - Embodiments of the invention relate to supporting transaction data committed to a stable storage. Committed data in the cluster is stored in the persistent cache layer and replicated and stored in the cache layer of one or more secondary nodes. One copy is designated as a master copy and all other copies are designated as replica, with an exclusive write lock assigned to the master and a shared write lock extended to the replica. An acknowledgement of receiving the data is communicated following confirmation that the data has been replicated to each node designated to receive the replica. Managers and a director are provided to support management of the master copy and the replicas within the file system, including invalidation of replicas, fault tolerance associated with failure of a node holding a master copy, recovery from a failed node, recovered of the file system from a power failure, and transferring master and replica copies within the file system. | 06-19-2014 |
20140173327 | CORRECTING A FAILURE ASSOCIATED WITH A CURRENT FIRMWARE IMAGE - Methods, apparatuses, and computer program products for correcting a failure associated with a current firmware image are provided. Embodiments include a firmware selection module detecting the failure associated with the current firmware image stored in firmware memory corresponding to a component of a system. Embodiments also include the firmware selection module selecting from a plurality of backup firmware images, a replacement firmware image based on a status of at least one backup firmware image in response to detecting the failure. Embodiments also include the firmware selection module storing the selected replacement firmware image in the firmware memory. | 06-19-2014 |
20140173328 | CORRECTING A FAILURE ASSOCIATED WITH A CURRENT FIRMWARE IMAGE - Methods, apparatuses, and computer program products for correcting a failure associated with a current firmware image are provided. Embodiments include a firmware selection module detecting the failure associated with the current firmware image stored in firmware memory corresponding to a component of a system. Embodiments also include the firmware selection module selecting from a plurality of backup firmware images, a replacement firmware image based on a status of at least one backup firmware image in response to detecting the failure. Embodiments also include the firmware selection module storing the selected replacement firmware image in the firmware memory. | 06-19-2014 |
20140189417 | APPARATUS AND METHOD FOR PARTIAL MEMORY MIRRORING - An apparatus and method are described for performing partial memory mirroring operations. For example, one embodiment of a processor comprises: a processor core for generating a read or write transaction having a system memory address; a home agent identified to service the read or write transaction based on the system memory address; one or more target address decoders (TADs) associated with the home agent to determine whether the system memory address is within a mirrored memory region or a non-mirrored memory region, wherein: if the system memory address is within a mirrored memory region, then the one or more TADs identifying multiple mirrored memory channels for the read or write transaction; and if the system memory address is not within a mirrored memory region, then the one or more TADs identifying a single memory channel for the read or write transaction. | 07-03-2014 |
20140195845 | FAULT ISOLATION WITH ABSTRACTED OBJECTS - In response to a notification of a fault captured in a system, a fault isolator serially analyzes each clock object to determine captured faults associated with the clock object. For each of the clock objects determined to have a captured fault, the fault isolator initiates a repair action for the chip represented by the clock object. The fault isolator concurrently analyzes the non-clock objects to determine captured faults associated with the non-clock objects after analysis of the clock objects. For each of the non-clock objects determined to have a captured fault, the fault isolator initiates a repair action for the chip represented by the non-clock object. | 07-10-2014 |
20140208150 | CROSS COMMUNICATION OF COMMON PROBLEM DETERMINATION AND RESOLUTION - Approaches for problem determination and resolution process cross communication are provided. Embodiments provide cross communication of a problem determination and resolution among similar data center devices. Specifically, symptoms of an error condition encountered for one data center device are captured by a first enterprise group, along with an associated resolution solution, and made available to an another enterprise group managing a commonly configured data center device, which may be faced with a similar error condition. The error signature and resolution steps captured by the first enterprise group are subsequently made available within and across multiple management domains operating within a common model (e.g., a publication-subscription system). Within this model, both the originator of the error determination and resolution (i.e., publisher), and one or more commonly configured data center devices susceptible to the same error condition (i.e., subscribers), can filter, access, and control the flow of error resolutions. | 07-24-2014 |
20140208151 | Method And Apparatus To Recover From An Erroneous Logic State In An Electronic System - An electronic system includes circuitry to detect errors in logic state in the system and to initiate corrective action when one or more errors are detected. In some embodiments, redundant information is stored within a system that is associated with an operational state of the system. If the operational state of the system is subsequently corrupted as a result of an electrical or mechanical overstress condition, resulting errors may be detected by comparing or otherwise processing the stored operational state information and the redundant information. | 07-24-2014 |
20140215256 | FEATURE CENTRIC DIAGNOSTICS FOR DISTRIBUTED COMPUTER SYSTEMS - A distributed computer system includes components. The components include embedded computer processors that make up an application within the distributed computer system. The computer processors are accessible by an end user of the system. The computer processors are operable to communicate with a plurality of system analyzers, to generate an operational status of the application in the system based on the communication with the plurality of system analyzers, to generate one or more recommendations to address or troubleshoot a non-desired operational status of the application within the system, and to provide a unified interface to the end user that provides to the end user the one or more recommendations to address or troubleshoot the non-desired operational status of the application within the system. | 07-31-2014 |
20140215257 | DAISY CHAIN DISTRIBUTION IN DATA CENTERS - A method and a system to provide daisy chain distribution in data centers are provided. A node identification module identifies three or more data nodes of a plurality of data nodes. The identification of three or more data nodes indicates that the respective data nodes are to receive a copy of a data file. A connection creation module to, using one or more processors, create communication connections between the three or more data nodes. The communication connections form a daisy chain beginning at a seeder data node of the three or more data nodes and ending at a terminal data node of the three or more data nodes. | 07-31-2014 |
20140223222 | INTELLIGENTLY RESPONDING TO HARDWARE FAILURES SO AS TO OPTIMIZE SYSTEM PERFORMANCE - A method, system and computer program product for intelligently responding to hardware failures so as to optimize system performance. An administrative server monitors the utilization of the hardware as well as the software components running on the hardware to assess a context of the software components running on the hardware. Upon detecting a hardware failure, the administrative server analyzes the hardware failure to determine the type of hardware failure and analyzes the properties of the workload running on the failed hardware. The administrative server then responds to the detected hardware failure based on various factors, including the type of the hardware failure, the properties of the workload running on the failed hardware and the context of the software running on the failed hardware. In this manner, by taking into consideration such factors in responding to the detected hardware failure, a more intelligent response is provided that optimizes system performance. | 08-07-2014 |
20140258769 | PARTIAL R-BLOCK RECYCLING - An apparatus includes a non-volatile memory and a controller. The non-volatile memory includes a plurality of R-blocks. The controller is coupled to the non-volatile memory. The controller is configured to (i) write data using the R-blocks as a unit of allocation and (ii) perform recycling operations selectively on either an entire one of the R-blocks or a portion less than all of one of the R-blocks. | 09-11-2014 |
20140281661 | Hybrid Memory System With Configurable Error Thresholds And Failure Analysis Capability - A system and method for configuring fault tolerance in nonvolatile memory (NVM) are operative to set a first threshold value, declare one or more portions of NVM invalid based on an error criterion, track the number of declared invalid NVM portions, determine if the tracked number exceeds the first threshold value, and if the tracked number exceeds the first threshold value, perform one or more remediation actions, such as issue a warning or prevent backup of volatile memory data in a hybrid memory system. In the event of backup failure, an extent of the backup can still be assessed by determining the amount of erased NVM that has remained erased after the backup, or by comparing a predicted backup end point with an actual endpoint. | 09-18-2014 |
20140289551 | FAULT MANAGEMENT IN AN IT INFRASTRUCTURE - Provided is a method of fault management in an IT infrastructure. An IT resource is monitored to identify a likelihood of occurrence of a fault related to the IT resource. Upon said identification, a determination is made whether a solution is available to prevent the occurrence of the fault related to the IT resource. If a solution is available, the solution is applied to the IT resource prior to the occurrence of the fault related to the IT resource. | 09-25-2014 |
20140298076 | PROCESSING APPARATUS, RECORDING MEDIUM STORING PROCESSING PROGRAM, AND PROCESSING METHOD - A processing apparatus that constitutes an information processing system includes: a device that constitutes the processing apparatus; and a processing unit that detects an abnormality in the device, that counts the number of the abnormalities detected in the device, and that logically separates the device from the information processing system when the counted number of the abnormalities detected in the device is equal to or greater than a threshold. | 10-02-2014 |
20140325254 | AUTOMATIC GENERATION OF ACTIONABLE RECOMMENDATIONS FROM PROBLEM REPORTS - Methods and arrangements for handling information technology tickets. A plurality of information technology tickets are received. The tickets are clustered into categories, and a problem area is identified with respect to at least one of the categories. At least one recommendation is automatically generated for addressing the problem area. Other variants and embodiments are broadly contemplated herein. | 10-30-2014 |
20150039929 | Method and Apparatus for Forming Software Fault Containment Units (SWFCUS) in a Distributed Real-Time System - The invention relates to a method for limiting the effects of software errors in a distributed real-time system in which a plurality of distributed application systems are executed simultaneously, wherein each application system forms an encapsulated software fault containment unit (SWFCU), wherein an SWFCU comprises the software of a distributed application system, said software being executed on one or more virtual computer nodes and one or more dedicated computer nodes, and exchanging messages via one or more encapsulated virtual communication systems, wherein a communication system consists of communication controllers, switching units and physical connections, and wherein the direct effects of a software error of an SWFCU remain limited to the SWFCU. | 02-05-2015 |
20150149810 | APPARATUS, SYSTEM AND METHOD FOR AUTONOMOUS RECOVERY FROM FAILURES DURING SYSTEM CHARACTERIZATION ON AN ENVIRONMENT WITH RESTRICTED RESOURCES - A power management mechanism maintains power to a processor and an integrated memory. Read-only logic and a cache are also provided. At power on, the read-only logic configures the cache as an internal memory and loads executable instructions in the cache. A copy of the executable instructions is stored in the internal memory. A branch instruction is also stored. Thereafter, the processor uses the copy of the executable instructions and present status information. The processor is programmed to issue a reset signal when a failure is detected. The read-only logic responds to the reset signal by going to the branch instruction in the internal memory, which directs the processor to use the copy of the executable instructions and status information in the internal memory circuit. The operating state is restored and the processor is instructed to execute the next instruction in the copy of executable instructions. | 05-28-2015 |
20150149811 | METHOD FOR MAINTAINING THE FUNCTIONAL ABILITY OF A FIELD DEVICE - A method for maintaining the functional ability of a field device of automation technology, wherein the method comprises the following steps: monitoring the field device for at least one achieved parameter change (Δn | 05-28-2015 |
20150355852 | SYSTEM, METHOD AND APPARATUS FOR PREVENTING DATA LOSS DUE TO MEMORY DEFECTS USING LATCHES - A system and method for operating a memory system includes receiving a first user data, writing the first user data to a first buffer, writing the first user data from the first buffer to a first selected memory location, writing the first user data from the first buffer into a second buffer when the first user data was successfully written to the first selected memory location. Data is retrieved from the first selected memory location and written into the first buffer. Data in the first buffer can be matched to the user data in the second buffer to confirm a successful storage of the first user data in the memory system. A previously stored user data can be retrieved from a third selected memory location and written into a third buffer when the previously stored user data was stored in the memory system before the first user data. | 12-10-2015 |
20150370627 | MANAGEMENT SYSTEM, PLAN GENERATION METHOD, PLAN GENERATION PROGRAM - A management system that generates a plan which is a countermeasure against an event occurring in a computer system includes: a plan generating unit configured to generate a plan according to the event; and an indicator generating unit configured to generate, as a performance change evaluation indicator of the plan, information on a change in performance of a resource of the computer system, which can occur due to other subject's process executed by the other subject different from a subject of the plan when the plan generated by the plan generating unit is executed. | 12-24-2015 |
20160042200 | ASICS HAVING PROGRAMMABLE BYPASS OF DESIGN FAULTS - A relatively small amount of programmable logic may be included in a mostly ASIC device such that the programmable logic can be used as a substitute for a fault-infected ASIC block. This substitution may occur permanently or temporarily. When an ASIC block is temporarily substituted, faulty outputs of the ASIC block are disabled just at the time they would otherwise propagate an error. The operations of the temporarily deactivated ASIC block(s) may be substituted for by appropriately programmed programmable logic. Thus, a fault-infected ASIC block that operates improperly 1% of the time can continue to be gainfully used for the 99% of the time when its operations are fault free. This substitution can be activated in various stages of the ASIC block's life including after: initial design; pilot production; and mass production. This provides for cost saving and faster time-to-market, repair, and maintenance even years after installation and use. | 02-11-2016 |
20160049203 | SYSTEM AND METHOD OF USING MULTIPLE READ OPERATIONS - Systems and methods are described for reading a storage element of a memory. In a particular embodiment, a method, in a data storage device including a controller and a non-volatile memory, where the non-volatile memory includes a plurality of storage elements, includes performing multiple read operations at a storage element of the non-volatile memory. Each read operation of the multiple read operations is performed using the same reading voltage. The method further includes determining a read value of the storage element based on the multiple read operations. | 02-18-2016 |
20160055046 | System fault detection and processing method, device, and computer readable storage medium - Disclosed are a method, a device, and a computer readable storage medium for detecting and processing a system fault. The method includes: an interrupt service routine sending a first stage kicking dog signal, and receiving a second stage kicking dog signal for a system detection task (S | 02-25-2016 |
20160065185 | MULTI-BIT FLIP-FLOP WITH ENHANCED FAULT DETECTION - A processing system includes a processor core, a peripheral component, and a flip-flop unit in at least one of the processor core and the peripheral component. The flip-flop unit can include a master latch, and two slave latches coupled to an output of the master latch. The first slave latch is formed over a first doped well region of a semiconductor substrate. The second slave latch is formed over a second doped well region of the semiconductor substrate. A comparator is coupled to an output of the first slave latch and to an output of the second slave latch. An output of the comparator indicates whether a state stored in the first slave latch is the same as a state stored in the second slave latch. | 03-03-2016 |
20160154692 | SYSTEMS AND/OR METHODS FOR HANDLING ERRONEOUS EVENTS IN COMPLEX EVENT PROCESSING (CEP) APPLICATIONS | 06-02-2016 |
20160378602 | PRE-BOOT SELF-HEALING AND ADAPTIVE FAULT ISOLATION - Systems and methods for providing pre-boot providing pre-boot self-healing and adaptive fault isolation. In some embodiments, an Information Handling System (IHS) includes a processor and a Basic I/O System (BIOS) coupled to the processor, the BIOS firmware having program instructions that, upon execution by the processor, cause the IHS to: initiate the booting of devices within the IHS following a predetermined boot order, wherein the predetermined boot order includes a first device followed by a second device; determine that the first device has been marked for bypass; bypass the booting of the first device; and boot the second device. | 12-29-2016 |
20190146862 | SYSTEM FOR TECHNOLOGY ANOMALY DETECTION, TRIAGE AND RESPONSE USING SOLUTION DATA MODELING | 05-16-2019 |