Patent application number | Description | Published |
20100036940 | Data Processing In A Hybrid Computing Environment - Data processing in a hybrid computing environment that includes a host computer and an accelerator, the host and the accelerator adapted to one another for data communications by a system level message passing module and a plurality data communications fabrics of at least two different fabric types, the data processing including: monitoring data communications performance for a plurality of data communications modes; receiving, from an application program on the host computer, a request to transmit data according to a data communications mode from the host computer to the accelerator; determining, in dependence upon the monitored performance, whether to transmit the data according to the requested data communications mode; and if the data is not to be transmitted according to the requested data communications mode: selecting, in dependence upon the monitored performance, another data communications mode for transmitting the data and transmitting the data according to the selected data communications mode. | 02-11-2010 |
20100095100 | Checkpointing A Hybrid Architecture Computing System - A method, apparatus, and program product checkpoint an application in a parallel computing system of the type that includes a plurality of hybrid nodes. Each hybrid node includes a host element and a plurality of accelerator elements. Each host element may include at least one multithreaded processor, and each accelerator element may include at least one multi-element processor. In a first hybrid node from among the plurality of hybrid nodes, checkpointing the application includes executing at least a portion of the application in the host element and at least one accelerator element and, in response to receiving a command to checkpoint the application, checkpointing the host element separately from the at least one accelerator element. | 04-15-2010 |
20100095152 | Checkpointing A Hybrid Architecture Computing System - A method, apparatus, and program product checkpoint an application in a parallel computing system of the type that includes a plurality of hybrid nodes. Each hybrid node includes a host element and a plurality of accelerator elements. Each host element may include at least one multithreaded processor, and each accelerator element may include at least one multi-element processor. In a first hybrid node from among the plurality of hybrid nodes, checkpointing the application includes executing at least a portion of the application in the host element, configuring and executing at least one computation kernel in at least one accelerator element, and, in response to receiving a command to checkpoint the application, checkpointing the host element separately from the at least one accelerator element upon which the at least one computation kernel is executing. | 04-15-2010 |
20100122199 | Using Accelerators in a Hybrid Architecture for System Checkpointing - A hybrid node of a High Performance Computing (HPC) cluster uses accelerator nodes for checkpointing to increase overall efficiency of the multi-node computing system. The host node or processor node reads/writes checkpoint data to the accelerators. After offloading the checkpoint data to the accelerators, the host processor can continue processing while the accelerators communicate the checkpoint data with the host or wait for the next checkpoint. The accelerators may also perform dynamic compression and decompression of the checkpoint data to reduce the checkpoint size and reduce network loading. The accelerators may also communicate with other node accelerators to compare checkpoint data to reduce the amount of checkpoint data stored to the host. | 05-13-2010 |
20100122256 | Scheduling Work in a Multi-Node Computer System Based on Checkpoint Characteristics - Efficient application checkpointing uses checkpointing characteristics of a job to determine how to schedule jobs for execution on a multi-node computer system. A checkpoint profile in the job description includes information on the expected frequency and duration of a check point cycle for the application. The checkpoint profile may be based on a user/administrator input as well as historical information. The job scheduler will attempt to group applications (jobs) that have the same checkpoint profile, on the same nodes or group of nodes. Additionally, the job scheduler may control when new jobs start based on when the next checkpoint cycle(s) are expected. The checkpoint monitor will monitor the checkpoint cycles, updating the checkpoint profiles of running jobs. The checkpoint monitor will also keep track of an overall system checkpoint profile to determine the available checkpointing capacity before scheduling jobs on the cluster. | 05-13-2010 |
20100191822 | Broadcasting Data In A Hybrid Computing Environment - Methods, apparatus, and products for broadcasting data in a hybrid computing environment that includes a host computer, a number of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the accelerators shared remotely with the host computer, where broadcasting data according to embodiments of the present invention includes: writing, by the host computer remotely to the shared local memory for the accelerators, the data to be broadcast; reading, by each of the accelerators from the shared local memory for the accelerators, the data; and notifying the host computer, by the accelerators, that the accelerators have read the data. | 07-29-2010 |
20100191823 | Data Processing In A Hybrid Computing Environment - Data processing in a hybrid computing environment that includes a host computer, a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the plurality of accelerators shared remotely with the host computer, where data processing according to embodiments of the present invention includes performing, by the plurality of accelerators, a local reduction operation with the local shared memory for the accelerators; writing remotely, by one of the plurality of accelerators to the shared memory local to the host computer, a result of the local reduction operation; and reading, by the host computer from shared memory local to the host computer, the result of the local reduction operation. | 07-29-2010 |
20100192123 | Software Development For A Hybrid Computing Environment - Software development for a hybrid computing environment that includes a host computer and an accelerator, the host computer and the accelerator adapted to one another for data communications by a system level message passing module and by two or more data communications fabrics of at least two different fabric types where software development includes creating, by a programmer, a computer program for execution in the hybrid computing environment, the computer program including directives for generation of computer program code that moves contents of memory among host computers and accelerators in the hybrid computing environment; generating, by a code generator application, source code in accordance with the directives; analyzing, by the code generator application, operation of the generated code for data movement and utilization of moved data; and regenerating, by the code generator application, the source code in accordance with the directives and further in accordance with results of the analysis. | 07-29-2010 |
20110035556 | Reducing Remote Reads Of Memory In A Hybrid Computing Environment By Maintaining Remote Memory Values Locally - Reducing remote reads of memory in a hybrid computing environment by maintaining remote memory values locally, the hybrid computing environment including a host computer and a plurality of accelerators, the host computer and the accelerators each having local memory shared remotely with the other, including writing to the shared memory of the host computer packets of data representing changes in accelerator memory values, incrementing, in local memory and in remote shared memory on the host computer, a counter value representing the total number of packets written to the host computer, reading by the host computer from the shared memory in the host computer the written data packets, moving the read data to application memory, and incrementing, in both local memory and in remote shared memory on the accelerator, a counter value representing the total number of packets read by the host computer. | 02-10-2011 |
20110191785 | Terminating An Accelerator Application Program In A Hybrid Computing Environment - Terminating an accelerator application program in a hybrid computing environment that includes a host computer having a host computer architecture and an accelerator having an accelerator architecture, where the host computer and the accelerator are adapted to one another for data communications by a system level message passing module (‘SLMPM’), and terminating an accelerator application program in a hybrid computing environment includes receiving, by the SLMPM from a host application executing on the host computer, a request to terminate an accelerator application program executing on the accelerator; terminating, by the SLMPM, execution of the accelerator application program; returning, by the SLMPM to the host application, a signal indicating that execution of the accelerator application program was terminated; and performing, by the SLMPM, a cleanup of the execution environment associated with the terminated accelerator application program. | 08-04-2011 |
20110225226 | Assigning A Unique Identifier To A Communicator - Creating, by a parent master process of a parent communicator, a child communicator, including configuring the child communicator with a child master process, wherein a communicator includes a collection of one or more processes executing on compute nodes of a distributed computing system; determining, by the parent master process, whether a unique identifier is available to assign to the child communicator; if a unique identifier is available to assign to the child communicator, assigning, by the parent master process, the available unique identifier to the child communicator; and if a unique identifier is not available to assign to the child communicator: retrieving, by the parent master process, an available unique identifier from a master process of another communicator in a tree of communicators and assigning the retrieved unique identifier to the child communicator. | 09-15-2011 |
20110225255 | Discovering A Resource In A Distributed Computing System - Sending, by a node requesting information regarding a resource to one or more nodes in a distributed computing system, an active message to perform a collective operation; contributing, by each node not having a resource, a value of zero to the collective operation; contributing, by a node having the resource, the node's rank; storing the result of the collective operation in a buffer of the requesting node; and identifying, in dependence upon the result of the collective operation, the rank of the node having the resource. | 09-15-2011 |
20110225297 | Controlling Access To A Resource In A Distributed Computing System With A Distributed Access Request Queue - Controlling access to a resource in a distributed computing system that includes nodes having a status field, a next field, a source data buffer, and that are characterized by a unique node identifier, where controlling access includes receiving a request for access to the resource implemented as an active message that includes the requesting node's unique node identifier, the value stored in the requesting node's source data buffer, and an instruction to perform a reduction operation with the value stored in the requesting node's source data buffer and the value stored in the receiving node's source data buffer; returning the requesting node's unique node identifier as a result of the reduction operation; and updating the status and next fields to identify the requesting node as a next node to have sole access to the resource. | 09-15-2011 |
20110267197 | Monitoring Operating Parameters In A Distributed Computing System With Active Messages - In a distributed computing system including a nodes organized for collective operations: initiating, by a root node through an active message to all other nodes, a collective operation, the active message including an instruction to each node to store operating parameter data in each node's send buffer; and, responsive to the active message: storing, by each node, the node's operating parameter data in the node's send buffer and returning, by the node, the operating parameter data as a result of the collective operation. | 11-03-2011 |
20110270986 | Optimizing Collective Operations - Optimizing collective operations including receiving an instruction to perform a collective operation type; selecting an optimized collective operation for the collective operation type; performing the selected optimized collective operation; determining whether a resource needed by the one or more nodes to perform the collective operation is not available; if a resource needed by the one or more nodes to perform the collective operation is not available: notifying the other nodes that the resource is not available; selecting a next optimized collective operation; and performing the next optimized collective operation. | 11-03-2011 |
20110271059 | REDUCING REMOTE READS OF MEMORY IN A HYBRID COMPUTING ENVIRONMENT - A hybrid computing environment in which the host computer allocates, in the shadow memory area of the host computer, a memory region for a packet to be written to the shared memory of an accelerator; writes packet data to the accelerator's shared memory in a memory region corresponding to the allocated memory region; inserts, in a next available element of the accelerator's descriptor array, a descriptor identifying the written packet data; increments the copy of the head pointer of the accelerator's descriptor array maintained on the host computer; and updates a copy of the head pointer of the accelerator's descriptor array maintained on the accelerator with the incremented copy. | 11-03-2011 |
20120110153 | Administering Incident Pools For Event And Alert Analysis - Administering incident pools including creating a pool of incidents, the pool having a predetermined initial period of time; assigning each received incident to the pool; assigning, by the incident analyzer, to each incident a predetermined minimum time for inclusion in a pool; extending for one or more of the incidents the predetermined initial period of time of the pool by a particular period of time assigned to the incident; determining whether conditions have been met to close the pool; and if conditions have been met to close the pool determining for each incident in the pool whether the incident has been in the pool for its predetermined minimum time for inclusion in a pool; and if the incident has not been in the pool for its predetermined minimum time, evicting the incident from the closed pool and including the incident in a next pool. | 05-03-2012 |
20120110161 | Relevant Alert Delivery In A Distributed Processing System - Methods, systems and products are provided relevant alert delivery including assigning by an event analyzer each received event to an events pool; determining by the event analyzer in dependence upon event analysis rules and the events assigned to the events pool whether to suppress one or more of the events; identifying by the event analyzer in dependence upon event analysis rules and the events assigned to the events pool one or more alerts; sending by the event analyzer to an alert analyzer all the alerts identified by the event analyzer; assigning by the alert analyzer the identified alerts to an alerts pool; determining by the alert analyzer in dependence upon alert analysis rules and the alerts in the alert pool whether to suppress any alerts; and transmitting the unsuppressed alerts to one or more components of the distributed processing system. | 05-03-2012 |
20120110600 | Relevant Alert Delivery With Event And Alert Suppression In A Distributed Processing System - Methods, systems and products are provided for relevant alert delivery with event and alert suppression including identifying by the event analyzer in dependence upon the event arrival rules one or more alerts; closing, by the event analyzer in dependence upon the events pool operation rules, the events pool; determining, by the events analyzer in dependence upon the event suppression rules, whether to suppress one or more events in the closed events pool; identifying by the event analyzer in dependence upon the events pool closure rules and any unsuppressed events assigned to the events pool, one or more additional alerts; assigning by the alert analyzer the identified alerts to an alerts pool; determining by the alert analyzer in dependence upon alert analysis rules and the alerts in the alert pool whether to suppress any alerts; and transmitting the unsuppressed alerts to one or more components of the distributed processing system. | 05-03-2012 |
20120144020 | Dynamic Administration Of Event Pools For Relevant Event And Alert Analysis During Event Storms - Dynamic administration of event pools for relevant event and alert analysis during event storms including receiving, by an events analyzer from an events queue, a plurality of events from one or more components of the distributed processing system, each event including an occurred time and a logged time; creating, by the event analyzer, an events pool; determining whether an arrival rate of the events from the components of the distributed processing system is greater than a predetermined threshold; if the arrival rate is greater than the predetermined threshold, assigning, by the events analyzer, a plurality of events to the events pool in dependence upon their occurred time; and if the arrival rate is not greater than the predetermined threshold, assigning, by the events analyzer, a plurality of events to the events pool in dependence upon their logged time. | 06-07-2012 |
20120144021 | Administering Event Reporting Rules In A Distributed Processing System - Methods, systems and products are provided for administering event reporting rules in a distributed processing system that includes identifying that one or more nodes of the distributed processing system is idle; for each identified idle node, collecting by the idle node any suppressed events and logged data from the node; sending the suppressed events and logged data to a database of events; and changing the event reporting rules for one or more components on the identified idle node in dependence upon the suppressed events and the logged data. | 06-07-2012 |
20120144243 | Dynamic Administration Of Component Event Reporting In A Distributed Processing System - Methods, systems and products are provided for dynamic administration of component event reporting in a distributed processing system including receiving, by an events analyzer from an events queue, a plurality of events from one or more components of the distributed processing system; determining, by the events analyzer in dependence upon the received events and one or more event analysis rules, to change the event reporting rules of one or more components; and instructing, by the events analyzer, the one or more components to change the event reporting rules. | 06-07-2012 |
20120144251 | Relevant Alert Delivery In A Distributed Processing System With Event Listeners and Alert Listeners - Relevant alert delivery including determining, by an events listener associated with an event queue, whether one or more events in an events queue have not been assigned to any events pool by any event analyzer; and if one or more events in the events queue have not been assigned to any events pool, identifying by the events listener in dependence upon the event analysis rules one or more alerts; sending by the event listener to an alerts queue all the alerts identified by the event listener; the alerts queue having an associated alerts listener; determining whether one or more alerts in the alerts queue have not been assigned to any alert pool; if one or more alerts in the alerts queue have not been assigned to any alerts pool, and determining in dependence upon alert analysis rules whether to suppress the alerts; and transmitting the unsuppressed alerts. | 06-07-2012 |
20120191920 | Reducing Remote Reads Of Memory In A Hybrid Computing Environment By Maintaining Remote Memory Values Locally - Reducing remote reads of memory in a hybrid computing environment by maintaining remote memory values locally, the hybrid computing environment including a host computer and a plurality of accelerators, the host computer and the accelerators each having local memory shared remotely with the other, including writing to the shared memory of the host computer packets of data representing changes in accelerator memory values, incrementing, in local memory and in remote shared memory on the host computer, a counter value representing the total number of packets written to the host computer, reading by the host computer from the shared memory in the host computer the written data packets, moving the read data to application memory, and incrementing, in both local memory and in remote shared memory on the accelerator, a counter value representing the total number of packets read by the host computer. | 07-26-2012 |
20120330918 | FLEXIBLE EVENT DATA CONTENT MANAGEMENT FOR RELEVANT EVENT AND ALERT ANALYSIS WITHIN A DISTRIBUTED PROCESSING SYSTEM - Methods, systems, and computer program products for flexible event data content management for relevant event and alert analysis within a distributed processing system are provided. Embodiments include receiving, by an interface connector, a raw event from a component of the distributed processing system; analyzing, by the interface connector, custom data within the raw event to determine a location to store the custom data, the custom data in a first data format; storing, by the interface connector, extended data within the raw event in a common event data format, the extended data indicating the location of the custom data; receiving, by an event analyzer, the event; and determining whether there are custom customer rules that need the custom data; and if there are such custom customer rules, retrieving the custom data based on the extended data from the event; and applying the custom customer rules to the extended data; if there are no such custom customer rules, applying the base rules to a base portion of the event. | 12-27-2012 |
20120331270 | Compressing Result Data For A Compute Node In A Parallel Computer - Compressing result data for a compute node in a parallel computer, the parallel computer including a collection of compute nodes organized as a tree, including: initiating a collective gather operation by a logical root of the collection of compute nodes, including adding result data of the logical root to a gather buffer; for each compute node in the collection of compute nodes, determining whether result data of the compute node is already written in the gather buffer; and if the result data of the compute node is already written in the gather buffer, incrementing a counter assigned to that result data already written in the gather buffer; and if the result data of the compute node is not already written in the gather buffer, writing the result data of the compute node as new result data in the gather buffer, incrementing a counter assigned to that new result data, and writing in the gather buffer a node ID. | 12-27-2012 |
20120331332 | Restarting Event And Alert Analysis After A Shutdown In A Distributed Processing System - Methods, systems, and computer program products for restarting event and alert analysis after a shutdown in a distributed processing system are provided. Embodiments include identifying, by an incident analyzer, a shutdown condition of the distributed processing system, the incident analyzer including a plurality of event analyzers and a monitor that monitors the plurality of event analyzers; and determining, by the incident analyzer, whether the shutdown was a planned shutdown or an unplanned shutdown; if the shutdown was planned, storing, by the incident analyzer, an identification of the last event in an event log that was injected in an event queue at the time of the planned shutdown and restarting, by the incident analyzer, event and alert analysis using the next event identified in the event log; and if the shutdown was unplanned, for each event analyzer, identifying the last event included in the last event pool that the event analyzer closed; and restarting, by the incident analyzer, event and alert analysis at the event analyzer using the next event received by the event analyzer after the identified last event. | 12-27-2012 |
20120331347 | Restarting Event And Alert Analysis After A Shutdown In A Distributed Processing System - Methods, systems, and computer program products for restarting event and alert analysis after a shutdown in a distributed processing system are provided. Embodiments include identifying, by an incident analyzer, a shutdown condition of the distributed processing system; and determining, by the incident analyzer, whether the shutdown was a planned shutdown or an unplanned shutdown; if the shutdown was planned, storing, by the incident analyzer, an identification of the last event in an event log that was injected in an event queue at the time of the planned shutdown and restarting, by the incident analyzer, event and alert analysis using the next event identified in the event log; if the shutdown was unplanned, identifying, by the incident analyzer, a previously configured restart mode; selecting, by the incident analyzer, an identification of a restart event in the event log according to the previously configured restart mode; and restarting, by the incident analyzer, event and alert analysis using the restart event identified in the event log. | 12-27-2012 |
20120331485 | Flexible Event Data Content Management For Relevant Event And Alert Analysis Within A Distributed Processing System - Methods, systems, and computer program products for flexible event data content management for relevant event and alert analysis within a distributed processing system are provided. Embodiments include capturing, by an interface connector, an event from a resource of the distributed processing system; inserting, by the interface connector, the event into an event database; receiving from the interface connector, by a notifier, a notification of insertion of the event into the event database; based on the received notification, tracking, by the notifier, the number of events indicated as inserted into the event database; receiving from the notifier, by a monitor, a cumulative notification indicating the number of events that have been inserted into the event database; in response to receiving the cumulative notification, retrieving, by the monitor, from the event database, events inserted into the event database; and processing, by the monitor, the retrieved events. | 12-27-2012 |
20130018935 | Performing Collective Operations In A Distributed Processing SystemAANM ARCHER; Charles J.AACI RochesterAAST MNAACO USAAGP ARCHER; Charles J. Rochester MN USAANM CAREY; James E.AACI RochesterAAST MNAACO USAAGP CAREY; James E. Rochester MN USAANM MARKLAND; Matthew W.AACI RochesterAAST MNAACO USAAGP MARKLAND; Matthew W. Rochester MN USAANM SANDERS; Philip J.AACI RochesterAAST MNAACO USAAGP SANDERS; Philip J. Rochester MN US - Methods, apparatuses, and computer program products for performing collective operations on a hybrid distributed processing system are provided. The hybrid distributed processing system includes a plurality of compute nodes, each compute node having a plurality of tasks, each task assigned a unique rank, each compute node coupled for data communications by at least one data communications network implementing at least two different networking topologies. A first networking topology includes a tiered tree topology having a root task, and at least two child tasks, where the two child tasks are peers of one another in the same tier. Embodiments include determining by at least one task that a parent of the task has failed to send the task data through the tree topology; and determining whether to request the data from a grandparent of the task or a peer of the task in the same tier in the tree topology; and if the task requests the data from the grandparent, requesting the data and receiving the data from the grandparent of the task through the second networking topology; and if the task requests the data from a peer of the task in the same tier in the tree, requesting the data and receiving the data from a peer of the task through the second networking topology. | 01-17-2013 |
20130018947 | Performing Collective Operations In A Distributed Processing SystemAANM Archer; Charles J.AACI RochesterAAST MNAACO USAAGP Archer; Charles J. Rochester MN USAANM Carey; James E.AACI RochesterAAST MNAACO USAAGP Carey; James E. Rochester MN USAANM Markland; Matthew W.AACI RochesterAAST MNAACO USAAGP Markland; Matthew W. Rochester MN USAANM Sanders; Philip J.AACI RochesterAAST MNAACO USAAGP Sanders; Philip J. Rochester MN US - Methods, apparatuses, and computer program products for performing collective operations on a hybrid distributed processing system are provided. The hybrid distributed processing system includes a plurality of compute nodes where each compute node has a plurality of tasks, each task is assigned a unique rank, and each compute node is coupled for data communications by at least one data communications network implementing at least two different networking topologies. At least one of the two networking topologies is a tiered tree topology having a root task and at least two child tasks and the at least two child tasks are peers of one another in the same tier. Embodiments include for each task, sending at least a portion of data corresponding to the task to all child tasks of the task through the tree topology; and sending at least a portion of the data corresponding to the task to all peers of the task at the same tier in the tree topology through the second topology. | 01-17-2013 |
20130024866 | Topology Mapping In A Distributed Processing System - Topology mapping in a distributed processing system, the distributed processing system including a plurality of compute nodes, each compute node having a plurality of tasks, each task assigned a unique rank, including: assigning each task to a geometry defining the resources available to the task; selecting, from a list of possible data communications algorithms, one or more algorithms configured for the assigned geometry; and identifying, by each task to all other tasks, the selected data communications algorithms of each task in a single collective operation. | 01-24-2013 |
20130060833 | TOPOLOGY MAPPING IN A DISTRIBUTED PROCESSING SYSTEM - Topology mapping in a distributed processing system, the distributed processing system including a plurality of compute nodes, each compute node having a plurality of tasks, each task assigned a unique rank, including: assigning each task to a geometry defining the resources available to the task; selecting, from a list of possible data communications algorithms, one or more algorithms configured for the assigned geometry; and identifying, by each task to all other tasks, the selected data communications algorithms of each task in a single collective operation. | 03-07-2013 |
20130060944 | CONTROLLING ACCESS TO A RESOURCE IN A DISTRIBUTED COMPUTING SYSTEM WITH A DISTRIBUTED ACCESS REQUEST QUEUE - Controlling access to a resource in a distributed computing system that includes nodes having a status field, a next field, a source data buffer, and that are characterized by a unique node identifier, where controlling access includes receiving a request for access to the resource implemented as an active message that includes the requesting node's unique node identifier, the value stored in the requesting node's source data buffer, and an instruction to perform a reduction operation with the value stored in the requesting node's source data buffer and the value stored in the receiving node's source data buffer; returning the requesting node's unique node identifier as a result of the reduction operation; and updating the status and next fields to identify the requesting node as a next node to have sole access to the resource. | 03-07-2013 |
20130066938 | PERFORMING COLLECTIVE OPERATIONS IN A DISTRIBUTED PROCESSING SYSTEM - Methods, apparatuses, and computer program products for performing collective operations on a hybrid distributed processing system that includes a plurality of compute nodes and a plurality of tasks, each task is assigned a unique rank, and each compute node is coupled for data communications by at least two different networking topologies. At least one of the two networking topologies is a tiered tree topology having a root task and at least two child tasks and the at least two child tasks are peers of one another in the same tier. Embodiments include for each task, sending at least a portion of data corresponding to the task to all child tasks of the task through the tree topology; and sending at least a portion of the data corresponding to the task to all peers of the task at the same tier in the tree topology through the second topology. | 03-14-2013 |
20130067198 | COMPRESSING RESULT DATA FOR A COMPUTE NODE IN A PARALLEL COMPUTER - A parallel computer is provided that includes a collection of compute nodes organized as a tree, including: initiating a collective gather operation by a logical root of the collection of compute nodes, including adding result data of the logical root to a gather buffer; for each compute node in the collection of compute nodes, determining whether result data of the compute node is already written in the gather buffer; and if the result data of the compute node is already written in the gather buffer, incrementing a counter assigned to that result data already written in the gather buffer; and if the result data of the compute node is not already written in the gather buffer, writing the result data of the compute node as new result data in the gather buffer, incrementing a counter assigned to that new result data, and writing in the gather buffer a node ID. | 03-14-2013 |
20130067483 | LOCALITY MAPPING IN A DISTRIBUTED PROCESSING SYSTEM - Topology mapping in a distributed processing system that includes a plurality of compute nodes, including: initiating a message passing operation; including in a message generated by the message passing operation, topological information for the sending task; mapping the topological information for the sending task; determining whether the sending task and the receiving task reside on the same topological unit; if the sending task and the receiving task reside on the same topological unit, using an optimal local network pattern for subsequent message passing operations between the sending task and the receiving task; otherwise, using a data communications network between the topological unit of the sending task and the topological unit of the receiving task for subsequent message passing operations between the sending task and the receiving task. | 03-14-2013 |
20130073726 | RESTARTING EVENT AND ALERT ANALYSIS AFTER A SHUTDOWN IN A DISTRIBUTED PROCESSING SYSTEM - Restarting event and alert analysis after a shutdown in a distributed processing system includes identifying a shutdown condition of the distributed processing system; determining whether the shutdown was a planned shutdown or an unplanned shutdown; if the shutdown was planned, storing an identification of the last event in an event log that was injected in an event queue at the time of the planned shutdown and restarting event and alert analysis using the next event identified in the event log; and if the shutdown was unplanned, for each event analyzer, identifying the last event included in the last event pool that the event analyzer closed; and restarting event and alert analysis at the event analyzer using the next event received by the event analyzer after the identified last event. | 03-21-2013 |
20130074102 | FLEXIBLE EVENT DATA CONTENT MANAGEMENT FOR RELEVANT EVENT AND ALERT ANALYSIS WITHIN A DISTRIBUTED PROCESSING SYSTEM - Methods, systems, and computer program products for flexible event data content management for relevant event and alert analysis within a distributed processing system are provided. Embodiments include capturing, by an interface connector, an event from a resource of the distributed processing system; inserting, by the interface connector, the event into an event database; receiving from the interface connector, by a notifier, a notification of insertion of the event into the event database; based on the received notification, tracking, by the notifier, the number of events indicated as inserted into the event database; receiving from the notifier, by a monitor, a cumulative notification indicating the number of events that have been inserted into the event database; in response to receiving the cumulative notification, retrieving, by the monitor, from the event database, events inserted into the event database; and processing, by the monitor, the retrieved events. | 03-21-2013 |
20130080630 | FLEXIBLE EVENT DATA CONTENT MANAGEMENT FOR RELEVANT EVENT AND ALERT ANALYSIS WITHIN A DISTRIBUTED PROCESSING SYSTEM - Flexible event data content management for relevant event and alert analysis within a distributed processing system includes receiving, by an interface connector, a raw event from a component of the distributed processing system; analyzing custom data within the raw event to determine a location to store the custom data, the custom data in a first data format; storing extended data within the raw event in a common event data format, the extended data indicating the location of the custom data; receiving, by an event analyzer, the event; and determining whether there are custom customer rules that need the custom data; and if there are such custom customer rules, retrieving the custom data based on the extended data from the event; and applying the custom customer rules to the extended data; if there are no such custom customer rules, applying the base rules to a base portion of the event. | 03-28-2013 |
20130081037 | PERFORMING COLLECTIVE OPERATIONS IN A DISTRIBUTED PROCESSING SYSTEM - Methods, apparatuses, and computer program products for performing collective operations on a hybrid distributed processing system including: determining by at least one task that a parent of the task has failed to send the task data through the tree topology; and determining whether to request the data from a grandparent of the task or a peer of the task in the same tier in the tree topology; and if the task requests the data from the grandparent, requesting the data and receiving the data from the grandparent of the task through the second networking topology; and if the task requests the data from a peer of the task in the same tier in the tree, requesting the data and receiving the data from a peer of the task through the second networking topology. | 03-28-2013 |
20130097300 | ADMINISTERING INCIDENT POOLS FOR EVENT AND ALERT ANALYSIS - Administering incident pools including creating a pool of incidents, the pool having a predetermined initial period of time; assigning each received incident to the pool; assigning, by the incident analyzer, to each incident a predetermined minimum time for inclusion in a pool; extending for one or more of the incidents the predetermined initial period of time of the pool by a particular period of time assigned to the incident; determining whether conditions have been met to close the pool; and if conditions have been met to close the pool determining for each incident in the pool whether the incident has been in the pool for its predetermined minimum time for inclusion in a pool; and if the incident has not been in the pool for its predetermined minimum time, evicting the incident from the closed pool and including the incident in a next pool. | 04-18-2013 |
20130138809 | RELEVANT ALERT DELIVERY IN A DISTRIBUTED PROCESSING SYSTEM - Methods, systems and products are provided relevant alert delivery including assigning by an event analyzer each received event to an events pool; determining by the event analyzer in dependence upon event analysis rules and the events assigned to the events pool whether to suppress one or more of the events; identifying by the event analyzer in dependence upon event analysis rules and the events assigned to the events pool one or more alerts; sending by the event analyzer to an alert analyzer all the alerts identified by the event analyzer; assigning by the alert analyzer the identified alerts to an alerts pool; determining by the alert analyzer in dependence upon alert analysis rules and the alerts in the alert pool whether to suppress any alerts; and transmitting the unsuppressed alerts to one or more components of the distributed processing system. | 05-30-2013 |
20130166743 | Relevant Alert Delivery In A Distributed Processing System - Methods, systems and products are provided relevant alert delivery including assigning by an event analyzer each received event to an events pool; determining by the event analyzer in dependence upon event analysis rules and the events assigned to the events pool whether to suppress one or more of the events; identifying by the event analyzer in dependence upon event analysis rules and the events assigned to the events pool one or more alerts; sending by the event analyzer to an alert analyzer all the alerts identified by the event analyzer; assigning by the alert analyzer the identified alerts to an alerts pool; determining by the alert analyzer in dependence upon alert analysis rules and the alerts in the alert pool whether to suppress any alerts; and transmitting the unsuppressed alerts to one or more components of the distributed processing system. | 06-27-2013 |
20130179905 | Administering Incident Pools For Event And Alert Analysis - Administering incident pools including creating a pool of incidents, the pool having a predetermined initial period of time; assigning each received incident to the pool; assigning, by the incident analyzer, to each incident a predetermined minimum time for inclusion in a pool; extending for one or more of the incidents the predetermined initial period of time of the pool by a particular period of time assigned to the incident; determining whether conditions have been met to close the pool; and if conditions have been met to close the pool determining for each incident in the pool whether the incident has been in the pool for its predetermined minimum time for inclusion in a pool; and if the incident has not been in the pool for its predetermined minimum time, evicting the incident from the closed pool and including the incident in a next pool. | 07-11-2013 |
20130191851 | Monitoring Operating Parameters In A Distributed Computing System With Active Messages - In a distributed computing system including a nodes organized for collective operations: initiating, by a root node through an active message to all other nodes, a collective operation, the active message including an instruction to each node to store operating parameter data in each node's send buffer; and, responsive to the active message: storing, by each node, the node's operating parameter data in the node's send buffer and returning, by the node, the operating parameter data as a result of the collective operation. | 07-25-2013 |
20130305103 | RELEVANT ALERT DELIVERY IN A DISTRIBUTED PROCESSING SYSTEM WITH EVENT LISTENERS AND ALERT LISTENERS - Relevant alert delivery including determining, by an events listener associated with an event queue, whether one or more events in an events queue have not been assigned to any events pool by any event analyzer; and if one or more events in the events queue have not been assigned to any events pool, identifying by the events listener in dependence upon the event analysis rules one or more alerts; sending by the event listener to an alerts queue all the alerts identified by the event listener; the alerts queue having an associated alerts listener; determining whether one or more alerts in the alerts queue have not been assigned to any alert pool; if one or more alerts in the alerts queue have not been assigned to any alerts pool, and determining in dependence upon alert analysis rules whether to suppress the alerts; and transmitting the unsuppressed alerts. | 11-14-2013 |
20130318404 | DYNAMIC ADMINISTRATION OF COMPONENT EVENT REPORTING IN A DISTRIBUTED PROCESSING SYSTEM - Methods, systems and products are provided for dynamic administration of component event reporting in a distributed processing system including receiving, by an events analyzer from an events queue, a plurality of events from one or more components of the distributed processing system; determining, by the events analyzer in dependence upon the received events and one or more event analysis rules, to change the event reporting rules of one or more components; and instructing, by the events analyzer, the one or more components to change the event reporting rules. | 11-28-2013 |
20140068347 | RESTARTING EVENT AND ALERT ANALYSIS AFTER A SHUTDOWN IN A DISTRIBUTED PROCESSING SYSTEM - Restarting event and alert analysis after a shutdown in a distributed processing system includes identifying a shutdown condition of the distributed processing system; and determining whether the shutdown was a planned shutdown or an unplanned shutdown; if the shutdown was planned, storing an identification of the last event in an event log that was injected in an event queue at the time of the planned shutdown and restarting event and alert analysis using the next event identified in the event log; if the shutdown was unplanned, identifying a previously configured restart mode; selecting an identification of a restart event in the event log according to the previously configured restart mode; and restarting event and alert analysis using the restart event identified in the event log. | 03-06-2014 |
20140101307 | DYNAMIC ADMINISTRATION OF EVENT POOLS FOR RELEVANT EVENT AND ALERT ANALYSIS DURING EVENT STORMS - Dynamic administration of event pools for relevant event and alert analysis during event storms including receiving, by an events analyzer from an events queue, a plurality of events from one or more components of the distributed processing system, each event including an occurred time and a logged time; creating, by the event analyzer, an events pool; determining whether an arrival rate of the events from the components of the distributed processing system is greater than a predetermined threshold; if the arrival rate is greater than the predetermined threshold, assigning, by the events analyzer, a plurality of events to the events pool in dependence upon their occurred time; and if the arrival rate is not greater than the predetermined threshold, assigning, by the events analyzer, a plurality of events to the events pool in dependence upon their logged time. | 04-10-2014 |