Patent application number | Description | Published |
20100191711 | Synchronizing Access To Resources In A Hybrid Computing Environment - Synchronizing access to resources in a hybrid computing environment that includes a host computer, a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, where synchronizing access to resources includes providing in a registry, to processes executing on the accelerators and the host computer, a key associated with a resource, the key having a value; attempting, by a process, to access the resource including determining whether a current value of the key represents an unlocked state for the resource; if the current value represents an unlocked state, attempting to lock access to the resource including setting the value to a unique identification of the process; determining whether the current value is the unique identification of the process; if the current value is the unique identification accessing the resource by the process. | 07-29-2010 |
20100191822 | Broadcasting Data In A Hybrid Computing Environment - Methods, apparatus, and products for broadcasting data in a hybrid computing environment that includes a host computer, a number of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the accelerators shared remotely with the host computer, where broadcasting data according to embodiments of the present invention includes: writing, by the host computer remotely to the shared local memory for the accelerators, the data to be broadcast; reading, by each of the accelerators from the shared local memory for the accelerators, the data; and notifying the host computer, by the accelerators, that the accelerators have read the data. | 07-29-2010 |
20100191823 | Data Processing In A Hybrid Computing Environment - Data processing in a hybrid computing environment that includes a host computer, a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, the host computer having local memory shared remotely with the accelerators, the accelerators having local memory for the plurality of accelerators shared remotely with the host computer, where data processing according to embodiments of the present invention includes performing, by the plurality of accelerators, a local reduction operation with the local shared memory for the accelerators; writing remotely, by one of the plurality of accelerators to the shared memory local to the host computer, a result of the local reduction operation; and reading, by the host computer from shared memory local to the host computer, the result of the local reduction operation. | 07-29-2010 |
20100191923 | Data Processing In A Computing Environment - Methods, apparatus, and products for data processing in a computing environment including allocating, by an operating system for an application, a virtual address spaces with each virtual address space mapped to a same physical address space and each virtual address space associated with an operation; receiving, from the application, an instruction to store a value in a specific virtual address, the specific virtual address contained within one of the allocated virtual address spaces; identifying a physical address associated with the specific virtual address; performing, with the value and the contents of the identified physical address, the operation associated with the virtual address space containing the specific virtual address; and storing a result of the operation in the identified physical address. | 07-29-2010 |
20100198997 | Direct Memory Access In A Hybrid Computing Environment - Direct memory access (‘DMA’) in a hybrid computing environment that includes a host computer, an accelerator, the host computer and the accelerator adapted to one another for data communications by a system level message passing module, where DMA includes identifying, by the system level message passing module, a buffer of data to be transferred from the host computer to the accelerator according to a DMA protocol; segmenting, by the system level message passing module, the buffer of data into a predefined number of memory segments; pinning, by the system level message passing module, the memory segments against paging; and asynchronously with respect to pinning the memory segments, effecting, by the system level message passing module, DMA transfers of the pinned memory segments from the host computer to the accelerator. | 08-05-2010 |
20110035556 | Reducing Remote Reads Of Memory In A Hybrid Computing Environment By Maintaining Remote Memory Values Locally - Reducing remote reads of memory in a hybrid computing environment by maintaining remote memory values locally, the hybrid computing environment including a host computer and a plurality of accelerators, the host computer and the accelerators each having local memory shared remotely with the other, including writing to the shared memory of the host computer packets of data representing changes in accelerator memory values, incrementing, in local memory and in remote shared memory on the host computer, a counter value representing the total number of packets written to the host computer, reading by the host computer from the shared memory in the host computer the written data packets, moving the read data to application memory, and incrementing, in both local memory and in remote shared memory on the accelerator, a counter value representing the total number of packets read by the host computer. | 02-10-2011 |
20110191785 | Terminating An Accelerator Application Program In A Hybrid Computing Environment - Terminating an accelerator application program in a hybrid computing environment that includes a host computer having a host computer architecture and an accelerator having an accelerator architecture, where the host computer and the accelerator are adapted to one another for data communications by a system level message passing module (‘SLMPM’), and terminating an accelerator application program in a hybrid computing environment includes receiving, by the SLMPM from a host application executing on the host computer, a request to terminate an accelerator application program executing on the accelerator; terminating, by the SLMPM, execution of the accelerator application program; returning, by the SLMPM to the host application, a signal indicating that execution of the accelerator application program was terminated; and performing, by the SLMPM, a cleanup of the execution environment associated with the terminated accelerator application program. | 08-04-2011 |
20110225226 | Assigning A Unique Identifier To A Communicator - Creating, by a parent master process of a parent communicator, a child communicator, including configuring the child communicator with a child master process, wherein a communicator includes a collection of one or more processes executing on compute nodes of a distributed computing system; determining, by the parent master process, whether a unique identifier is available to assign to the child communicator; if a unique identifier is available to assign to the child communicator, assigning, by the parent master process, the available unique identifier to the child communicator; and if a unique identifier is not available to assign to the child communicator: retrieving, by the parent master process, an available unique identifier from a master process of another communicator in a tree of communicators and assigning the retrieved unique identifier to the child communicator. | 09-15-2011 |
20110225255 | Discovering A Resource In A Distributed Computing System - Sending, by a node requesting information regarding a resource to one or more nodes in a distributed computing system, an active message to perform a collective operation; contributing, by each node not having a resource, a value of zero to the collective operation; contributing, by a node having the resource, the node's rank; storing the result of the collective operation in a buffer of the requesting node; and identifying, in dependence upon the result of the collective operation, the rank of the node having the resource. | 09-15-2011 |
20110225297 | Controlling Access To A Resource In A Distributed Computing System With A Distributed Access Request Queue - Controlling access to a resource in a distributed computing system that includes nodes having a status field, a next field, a source data buffer, and that are characterized by a unique node identifier, where controlling access includes receiving a request for access to the resource implemented as an active message that includes the requesting node's unique node identifier, the value stored in the requesting node's source data buffer, and an instruction to perform a reduction operation with the value stored in the requesting node's source data buffer and the value stored in the receiving node's source data buffer; returning the requesting node's unique node identifier as a result of the reduction operation; and updating the status and next fields to identify the requesting node as a next node to have sole access to the resource. | 09-15-2011 |
20110267197 | Monitoring Operating Parameters In A Distributed Computing System With Active Messages - In a distributed computing system including a nodes organized for collective operations: initiating, by a root node through an active message to all other nodes, a collective operation, the active message including an instruction to each node to store operating parameter data in each node's send buffer; and, responsive to the active message: storing, by each node, the node's operating parameter data in the node's send buffer and returning, by the node, the operating parameter data as a result of the collective operation. | 11-03-2011 |
20110270986 | Optimizing Collective Operations - Optimizing collective operations including receiving an instruction to perform a collective operation type; selecting an optimized collective operation for the collective operation type; performing the selected optimized collective operation; determining whether a resource needed by the one or more nodes to perform the collective operation is not available; if a resource needed by the one or more nodes to perform the collective operation is not available: notifying the other nodes that the resource is not available; selecting a next optimized collective operation; and performing the next optimized collective operation. | 11-03-2011 |
20110271059 | REDUCING REMOTE READS OF MEMORY IN A HYBRID COMPUTING ENVIRONMENT - A hybrid computing environment in which the host computer allocates, in the shadow memory area of the host computer, a memory region for a packet to be written to the shared memory of an accelerator; writes packet data to the accelerator's shared memory in a memory region corresponding to the allocated memory region; inserts, in a next available element of the accelerator's descriptor array, a descriptor identifying the written packet data; increments the copy of the head pointer of the accelerator's descriptor array maintained on the host computer; and updates a copy of the head pointer of the accelerator's descriptor array maintained on the accelerator with the incremented copy. | 11-03-2011 |
20120110153 | Administering Incident Pools For Event And Alert Analysis - Administering incident pools including creating a pool of incidents, the pool having a predetermined initial period of time; assigning each received incident to the pool; assigning, by the incident analyzer, to each incident a predetermined minimum time for inclusion in a pool; extending for one or more of the incidents the predetermined initial period of time of the pool by a particular period of time assigned to the incident; determining whether conditions have been met to close the pool; and if conditions have been met to close the pool determining for each incident in the pool whether the incident has been in the pool for its predetermined minimum time for inclusion in a pool; and if the incident has not been in the pool for its predetermined minimum time, evicting the incident from the closed pool and including the incident in a next pool. | 05-03-2012 |
20120110161 | Relevant Alert Delivery In A Distributed Processing System - Methods, systems and products are provided relevant alert delivery including assigning by an event analyzer each received event to an events pool; determining by the event analyzer in dependence upon event analysis rules and the events assigned to the events pool whether to suppress one or more of the events; identifying by the event analyzer in dependence upon event analysis rules and the events assigned to the events pool one or more alerts; sending by the event analyzer to an alert analyzer all the alerts identified by the event analyzer; assigning by the alert analyzer the identified alerts to an alerts pool; determining by the alert analyzer in dependence upon alert analysis rules and the alerts in the alert pool whether to suppress any alerts; and transmitting the unsuppressed alerts to one or more components of the distributed processing system. | 05-03-2012 |
20120110600 | Relevant Alert Delivery With Event And Alert Suppression In A Distributed Processing System - Methods, systems and products are provided for relevant alert delivery with event and alert suppression including identifying by the event analyzer in dependence upon the event arrival rules one or more alerts; closing, by the event analyzer in dependence upon the events pool operation rules, the events pool; determining, by the events analyzer in dependence upon the event suppression rules, whether to suppress one or more events in the closed events pool; identifying by the event analyzer in dependence upon the events pool closure rules and any unsuppressed events assigned to the events pool, one or more additional alerts; assigning by the alert analyzer the identified alerts to an alerts pool; determining by the alert analyzer in dependence upon alert analysis rules and the alerts in the alert pool whether to suppress any alerts; and transmitting the unsuppressed alerts to one or more components of the distributed processing system. | 05-03-2012 |
20120144020 | Dynamic Administration Of Event Pools For Relevant Event And Alert Analysis During Event Storms - Dynamic administration of event pools for relevant event and alert analysis during event storms including receiving, by an events analyzer from an events queue, a plurality of events from one or more components of the distributed processing system, each event including an occurred time and a logged time; creating, by the event analyzer, an events pool; determining whether an arrival rate of the events from the components of the distributed processing system is greater than a predetermined threshold; if the arrival rate is greater than the predetermined threshold, assigning, by the events analyzer, a plurality of events to the events pool in dependence upon their occurred time; and if the arrival rate is not greater than the predetermined threshold, assigning, by the events analyzer, a plurality of events to the events pool in dependence upon their logged time. | 06-07-2012 |
20120144021 | Administering Event Reporting Rules In A Distributed Processing System - Methods, systems and products are provided for administering event reporting rules in a distributed processing system that includes identifying that one or more nodes of the distributed processing system is idle; for each identified idle node, collecting by the idle node any suppressed events and logged data from the node; sending the suppressed events and logged data to a database of events; and changing the event reporting rules for one or more components on the identified idle node in dependence upon the suppressed events and the logged data. | 06-07-2012 |
20120144243 | Dynamic Administration Of Component Event Reporting In A Distributed Processing System - Methods, systems and products are provided for dynamic administration of component event reporting in a distributed processing system including receiving, by an events analyzer from an events queue, a plurality of events from one or more components of the distributed processing system; determining, by the events analyzer in dependence upon the received events and one or more event analysis rules, to change the event reporting rules of one or more components; and instructing, by the events analyzer, the one or more components to change the event reporting rules. | 06-07-2012 |
20120144251 | Relevant Alert Delivery In A Distributed Processing System With Event Listeners and Alert Listeners - Relevant alert delivery including determining, by an events listener associated with an event queue, whether one or more events in an events queue have not been assigned to any events pool by any event analyzer; and if one or more events in the events queue have not been assigned to any events pool, identifying by the events listener in dependence upon the event analysis rules one or more alerts; sending by the event listener to an alerts queue all the alerts identified by the event listener; the alerts queue having an associated alerts listener; determining whether one or more alerts in the alerts queue have not been assigned to any alert pool; if one or more alerts in the alerts queue have not been assigned to any alerts pool, and determining in dependence upon alert analysis rules whether to suppress the alerts; and transmitting the unsuppressed alerts. | 06-07-2012 |
20120174105 | Locality Mapping In A Distributed Processing System - Topology mapping in a distributed processing system that includes a plurality of compute nodes, including: initiating a message passing operation; including in a message generated by the message passing operation, topological information for the sending task; mapping the topological information for the sending task; determining whether the sending task and the receiving task reside on the same topological unit; if the sending task and the receiving task reside on the same topological unit, using an optimal local network pattern for subsequent message passing operations between the sending task and the receiving task; otherwise, using a data communications network between the topological unit of the sending task and the topological unit of the receiving task for subsequent message passing operations between the sending task and the receiving task. | 07-05-2012 |
20120191920 | Reducing Remote Reads Of Memory In A Hybrid Computing Environment By Maintaining Remote Memory Values Locally - Reducing remote reads of memory in a hybrid computing environment by maintaining remote memory values locally, the hybrid computing environment including a host computer and a plurality of accelerators, the host computer and the accelerators each having local memory shared remotely with the other, including writing to the shared memory of the host computer packets of data representing changes in accelerator memory values, incrementing, in local memory and in remote shared memory on the host computer, a counter value representing the total number of packets written to the host computer, reading by the host computer from the shared memory in the host computer the written data packets, moving the read data to application memory, and incrementing, in both local memory and in remote shared memory on the accelerator, a counter value representing the total number of packets read by the host computer. | 07-26-2012 |
20120246649 | Synchronizing Access To Resources In A Hybrid Computing Environment - Synchronizing access to resources in a hybrid computing environment that includes a host computer, a plurality of accelerators, the host computer and the accelerators adapted to one another for data communications by a system level message passing module, where synchronizing access to resources includes providing in a registry, to processes executing on the accelerators and the host computer, a key associated with a resource, the key having a value; attempting, by a process, to access the resource including determining whether a current value of the key represents an unlocked state for the resource; if the current value represents an unlocked state, attempting to lock access to the resource including setting the value to a unique identification of the process; determining whether the current value is the unique identification of the process; if the current value is the unique identification accessing the resource by the process. | 09-27-2012 |
20120303815 | Event Management In A Distributed Processing System - Methods, systems, and computer program products for event management in a distributed processing system are provided. Embodiments include receiving, by the incident analyzer, one or more events from one or more resources, each event identifying a location of the resource producing the event; identifying, by the incident analyzer, an action in dependence upon the one or more events and the location of the one or more resources producing the one or more events; identifying, by the incident analyzer, a location scope for the action in dependence upon the one or more events; and executing, by the incident analyzer, the identified action. | 11-29-2012 |
20120304012 | Administering Incident Pools For Event And Alert Analysis - Administering incident pools including receiving, by an incident analyzer from an incident queue, a plurality of incidents from one or more components of the distributed processing system; assigning, by the incident analyzer, each received incident to a pool of incidents; assigning, by the incident analyzer, to each incident a particular combined minimum time for inclusion in one or more pools, each particular combined minimum time corresponding to a particular incident; in response to the pool closing, determining, by the incident analyzer, for each incident in the pool whether the incident has met its combined minimum time for inclusion in one or more pools; and if the incident has been in the pool for its combined minimum time, including, by the incident analyzer, the incident in the closed pool; and if the incident has not been in the pool for its combined minimum time, including the incident in a next pool. | 11-29-2012 |
20120304013 | Administering Event Pools For Relevant Event Analysis In A Distributed Processing System - Methods, systems, and computer program products for administering event pools for relevant event analysis are provided. Embodiments include assigning, by an incident analyzer, a plurality of events to an events pool; determining, by the incident analyzer, an event suppression duration; determining, by the incident analyzer in dependence upon event analysis rules, to suppress events having particular attributes indicating the events occurred during the event suppression duration; and suppressing, by the incident analyzer, each event assigned to the events pool having the particular attributes indicating the events occurred during the event suppression duration. | 11-29-2012 |
20120304022 | Configurable Alert Delivery In A Distributed Processing System - Methods, systems, and computer program products for configurable alert delivery in a distributed processing system are provided. Embodiments include for each alert generated by an incident analyzer, applying, by the incident analyzer, active alert filters to the alert; wherein applying the active alert filters to the alert includes: creating, by the incident analyzer, a list of all active alert filters and a set of all active listeners; and for each active alert filter, running, by the incident analyzer, the active alert filter; if the active alert filter indicates that the alert should not go to one or more of the active listeners, removing, by the incident analyzer, the one or more active listeners from the set of all active listeners; if the active listeners set is empty, stopping, by the incident analyzer, processing of the alert; and if the active listeners set is not empty, selecting, by the incident analyzer, the next active alert filter from the active alert filter list. | 11-29-2012 |
20120330918 | FLEXIBLE EVENT DATA CONTENT MANAGEMENT FOR RELEVANT EVENT AND ALERT ANALYSIS WITHIN A DISTRIBUTED PROCESSING SYSTEM - Methods, systems, and computer program products for flexible event data content management for relevant event and alert analysis within a distributed processing system are provided. Embodiments include receiving, by an interface connector, a raw event from a component of the distributed processing system; analyzing, by the interface connector, custom data within the raw event to determine a location to store the custom data, the custom data in a first data format; storing, by the interface connector, extended data within the raw event in a common event data format, the extended data indicating the location of the custom data; receiving, by an event analyzer, the event; and determining whether there are custom customer rules that need the custom data; and if there are such custom customer rules, retrieving the custom data based on the extended data from the event; and applying the custom customer rules to the extended data; if there are no such custom customer rules, applying the base rules to a base portion of the event. | 12-27-2012 |
20120331270 | Compressing Result Data For A Compute Node In A Parallel Computer - Compressing result data for a compute node in a parallel computer, the parallel computer including a collection of compute nodes organized as a tree, including: initiating a collective gather operation by a logical root of the collection of compute nodes, including adding result data of the logical root to a gather buffer; for each compute node in the collection of compute nodes, determining whether result data of the compute node is already written in the gather buffer; and if the result data of the compute node is already written in the gather buffer, incrementing a counter assigned to that result data already written in the gather buffer; and if the result data of the compute node is not already written in the gather buffer, writing the result data of the compute node as new result data in the gather buffer, incrementing a counter assigned to that new result data, and writing in the gather buffer a node ID. | 12-27-2012 |
20120331332 | Restarting Event And Alert Analysis After A Shutdown In A Distributed Processing System - Methods, systems, and computer program products for restarting event and alert analysis after a shutdown in a distributed processing system are provided. Embodiments include identifying, by an incident analyzer, a shutdown condition of the distributed processing system, the incident analyzer including a plurality of event analyzers and a monitor that monitors the plurality of event analyzers; and determining, by the incident analyzer, whether the shutdown was a planned shutdown or an unplanned shutdown; if the shutdown was planned, storing, by the incident analyzer, an identification of the last event in an event log that was injected in an event queue at the time of the planned shutdown and restarting, by the incident analyzer, event and alert analysis using the next event identified in the event log; and if the shutdown was unplanned, for each event analyzer, identifying the last event included in the last event pool that the event analyzer closed; and restarting, by the incident analyzer, event and alert analysis at the event analyzer using the next event received by the event analyzer after the identified last event. | 12-27-2012 |
20120331347 | Restarting Event And Alert Analysis After A Shutdown In A Distributed Processing System - Methods, systems, and computer program products for restarting event and alert analysis after a shutdown in a distributed processing system are provided. Embodiments include identifying, by an incident analyzer, a shutdown condition of the distributed processing system; and determining, by the incident analyzer, whether the shutdown was a planned shutdown or an unplanned shutdown; if the shutdown was planned, storing, by the incident analyzer, an identification of the last event in an event log that was injected in an event queue at the time of the planned shutdown and restarting, by the incident analyzer, event and alert analysis using the next event identified in the event log; if the shutdown was unplanned, identifying, by the incident analyzer, a previously configured restart mode; selecting, by the incident analyzer, an identification of a restart event in the event log according to the previously configured restart mode; and restarting, by the incident analyzer, event and alert analysis using the restart event identified in the event log. | 12-27-2012 |
20120331485 | Flexible Event Data Content Management For Relevant Event And Alert Analysis Within A Distributed Processing System - Methods, systems, and computer program products for flexible event data content management for relevant event and alert analysis within a distributed processing system are provided. Embodiments include capturing, by an interface connector, an event from a resource of the distributed processing system; inserting, by the interface connector, the event into an event database; receiving from the interface connector, by a notifier, a notification of insertion of the event into the event database; based on the received notification, tracking, by the notifier, the number of events indicated as inserted into the event database; receiving from the notifier, by a monitor, a cumulative notification indicating the number of events that have been inserted into the event database; in response to receiving the cumulative notification, retrieving, by the monitor, from the event database, events inserted into the event database; and processing, by the monitor, the retrieved events. | 12-27-2012 |
20130018935 | Performing Collective Operations In A Distributed Processing SystemAANM ARCHER; Charles J.AACI RochesterAAST MNAACO USAAGP ARCHER; Charles J. Rochester MN USAANM CAREY; James E.AACI RochesterAAST MNAACO USAAGP CAREY; James E. Rochester MN USAANM MARKLAND; Matthew W.AACI RochesterAAST MNAACO USAAGP MARKLAND; Matthew W. Rochester MN USAANM SANDERS; Philip J.AACI RochesterAAST MNAACO USAAGP SANDERS; Philip J. Rochester MN US - Methods, apparatuses, and computer program products for performing collective operations on a hybrid distributed processing system are provided. The hybrid distributed processing system includes a plurality of compute nodes, each compute node having a plurality of tasks, each task assigned a unique rank, each compute node coupled for data communications by at least one data communications network implementing at least two different networking topologies. A first networking topology includes a tiered tree topology having a root task, and at least two child tasks, where the two child tasks are peers of one another in the same tier. Embodiments include determining by at least one task that a parent of the task has failed to send the task data through the tree topology; and determining whether to request the data from a grandparent of the task or a peer of the task in the same tier in the tree topology; and if the task requests the data from the grandparent, requesting the data and receiving the data from the grandparent of the task through the second networking topology; and if the task requests the data from a peer of the task in the same tier in the tree, requesting the data and receiving the data from a peer of the task through the second networking topology. | 01-17-2013 |
20130018947 | Performing Collective Operations In A Distributed Processing SystemAANM Archer; Charles J.AACI RochesterAAST MNAACO USAAGP Archer; Charles J. Rochester MN USAANM Carey; James E.AACI RochesterAAST MNAACO USAAGP Carey; James E. Rochester MN USAANM Markland; Matthew W.AACI RochesterAAST MNAACO USAAGP Markland; Matthew W. Rochester MN USAANM Sanders; Philip J.AACI RochesterAAST MNAACO USAAGP Sanders; Philip J. Rochester MN US - Methods, apparatuses, and computer program products for performing collective operations on a hybrid distributed processing system are provided. The hybrid distributed processing system includes a plurality of compute nodes where each compute node has a plurality of tasks, each task is assigned a unique rank, and each compute node is coupled for data communications by at least one data communications network implementing at least two different networking topologies. At least one of the two networking topologies is a tiered tree topology having a root task and at least two child tasks and the at least two child tasks are peers of one another in the same tier. Embodiments include for each task, sending at least a portion of data corresponding to the task to all child tasks of the task through the tree topology; and sending at least a portion of the data corresponding to the task to all peers of the task at the same tier in the tree topology through the second topology. | 01-17-2013 |
20130024866 | Topology Mapping In A Distributed Processing System - Topology mapping in a distributed processing system, the distributed processing system including a plurality of compute nodes, each compute node having a plurality of tasks, each task assigned a unique rank, including: assigning each task to a geometry defining the resources available to the task; selecting, from a list of possible data communications algorithms, one or more algorithms configured for the assigned geometry; and identifying, by each task to all other tasks, the selected data communications algorithms of each task in a single collective operation. | 01-24-2013 |
20130060833 | TOPOLOGY MAPPING IN A DISTRIBUTED PROCESSING SYSTEM - Topology mapping in a distributed processing system, the distributed processing system including a plurality of compute nodes, each compute node having a plurality of tasks, each task assigned a unique rank, including: assigning each task to a geometry defining the resources available to the task; selecting, from a list of possible data communications algorithms, one or more algorithms configured for the assigned geometry; and identifying, by each task to all other tasks, the selected data communications algorithms of each task in a single collective operation. | 03-07-2013 |
20130060944 | CONTROLLING ACCESS TO A RESOURCE IN A DISTRIBUTED COMPUTING SYSTEM WITH A DISTRIBUTED ACCESS REQUEST QUEUE - Controlling access to a resource in a distributed computing system that includes nodes having a status field, a next field, a source data buffer, and that are characterized by a unique node identifier, where controlling access includes receiving a request for access to the resource implemented as an active message that includes the requesting node's unique node identifier, the value stored in the requesting node's source data buffer, and an instruction to perform a reduction operation with the value stored in the requesting node's source data buffer and the value stored in the receiving node's source data buffer; returning the requesting node's unique node identifier as a result of the reduction operation; and updating the status and next fields to identify the requesting node as a next node to have sole access to the resource. | 03-07-2013 |
20130066938 | PERFORMING COLLECTIVE OPERATIONS IN A DISTRIBUTED PROCESSING SYSTEM - Methods, apparatuses, and computer program products for performing collective operations on a hybrid distributed processing system that includes a plurality of compute nodes and a plurality of tasks, each task is assigned a unique rank, and each compute node is coupled for data communications by at least two different networking topologies. At least one of the two networking topologies is a tiered tree topology having a root task and at least two child tasks and the at least two child tasks are peers of one another in the same tier. Embodiments include for each task, sending at least a portion of data corresponding to the task to all child tasks of the task through the tree topology; and sending at least a portion of the data corresponding to the task to all peers of the task at the same tier in the tree topology through the second topology. | 03-14-2013 |
20130067198 | COMPRESSING RESULT DATA FOR A COMPUTE NODE IN A PARALLEL COMPUTER - A parallel computer is provided that includes a collection of compute nodes organized as a tree, including: initiating a collective gather operation by a logical root of the collection of compute nodes, including adding result data of the logical root to a gather buffer; for each compute node in the collection of compute nodes, determining whether result data of the compute node is already written in the gather buffer; and if the result data of the compute node is already written in the gather buffer, incrementing a counter assigned to that result data already written in the gather buffer; and if the result data of the compute node is not already written in the gather buffer, writing the result data of the compute node as new result data in the gather buffer, incrementing a counter assigned to that new result data, and writing in the gather buffer a node ID. | 03-14-2013 |
20130067483 | LOCALITY MAPPING IN A DISTRIBUTED PROCESSING SYSTEM - Topology mapping in a distributed processing system that includes a plurality of compute nodes, including: initiating a message passing operation; including in a message generated by the message passing operation, topological information for the sending task; mapping the topological information for the sending task; determining whether the sending task and the receiving task reside on the same topological unit; if the sending task and the receiving task reside on the same topological unit, using an optimal local network pattern for subsequent message passing operations between the sending task and the receiving task; otherwise, using a data communications network between the topological unit of the sending task and the topological unit of the receiving task for subsequent message passing operations between the sending task and the receiving task. | 03-14-2013 |
20130073726 | RESTARTING EVENT AND ALERT ANALYSIS AFTER A SHUTDOWN IN A DISTRIBUTED PROCESSING SYSTEM - Restarting event and alert analysis after a shutdown in a distributed processing system includes identifying a shutdown condition of the distributed processing system; determining whether the shutdown was a planned shutdown or an unplanned shutdown; if the shutdown was planned, storing an identification of the last event in an event log that was injected in an event queue at the time of the planned shutdown and restarting event and alert analysis using the next event identified in the event log; and if the shutdown was unplanned, for each event analyzer, identifying the last event included in the last event pool that the event analyzer closed; and restarting event and alert analysis at the event analyzer using the next event received by the event analyzer after the identified last event. | 03-21-2013 |
20130074102 | FLEXIBLE EVENT DATA CONTENT MANAGEMENT FOR RELEVANT EVENT AND ALERT ANALYSIS WITHIN A DISTRIBUTED PROCESSING SYSTEM - Methods, systems, and computer program products for flexible event data content management for relevant event and alert analysis within a distributed processing system are provided. Embodiments include capturing, by an interface connector, an event from a resource of the distributed processing system; inserting, by the interface connector, the event into an event database; receiving from the interface connector, by a notifier, a notification of insertion of the event into the event database; based on the received notification, tracking, by the notifier, the number of events indicated as inserted into the event database; receiving from the notifier, by a monitor, a cumulative notification indicating the number of events that have been inserted into the event database; in response to receiving the cumulative notification, retrieving, by the monitor, from the event database, events inserted into the event database; and processing, by the monitor, the retrieved events. | 03-21-2013 |
20130080630 | FLEXIBLE EVENT DATA CONTENT MANAGEMENT FOR RELEVANT EVENT AND ALERT ANALYSIS WITHIN A DISTRIBUTED PROCESSING SYSTEM - Flexible event data content management for relevant event and alert analysis within a distributed processing system includes receiving, by an interface connector, a raw event from a component of the distributed processing system; analyzing custom data within the raw event to determine a location to store the custom data, the custom data in a first data format; storing extended data within the raw event in a common event data format, the extended data indicating the location of the custom data; receiving, by an event analyzer, the event; and determining whether there are custom customer rules that need the custom data; and if there are such custom customer rules, retrieving the custom data based on the extended data from the event; and applying the custom customer rules to the extended data; if there are no such custom customer rules, applying the base rules to a base portion of the event. | 03-28-2013 |
20130081037 | PERFORMING COLLECTIVE OPERATIONS IN A DISTRIBUTED PROCESSING SYSTEM - Methods, apparatuses, and computer program products for performing collective operations on a hybrid distributed processing system including: determining by at least one task that a parent of the task has failed to send the task data through the tree topology; and determining whether to request the data from a grandparent of the task or a peer of the task in the same tier in the tree topology; and if the task requests the data from the grandparent, requesting the data and receiving the data from the grandparent of the task through the second networking topology; and if the task requests the data from a peer of the task in the same tier in the tree, requesting the data and receiving the data from a peer of the task through the second networking topology. | 03-28-2013 |
20130091386 | ADMINISTERING EVENT POOLS FOR RELEVANT EVENT ANALYSIS IN A DISTRIBUTED PROCESSING SYSTEM - Methods, systems, and computer program products for administering event pools for relevant event analysis are provided. Embodiments include assigning, by an incident analyzer, a plurality of events to an events pool; determining, by the incident analyzer, an event suppression duration; determining, by the incident analyzer in dependence upon event analysis rules, to suppress events having particular attributes indicating the events occurred during the event suppression duration; and suppressing, by the incident analyzer, each event assigned to the events pool having the particular attributes indicating the events occurred during the event suppression duration. | 04-11-2013 |
20130097215 | Selected Alert Delivery In A Distributed Processing System - Methods, apparatuses, and computer program products for selected alert delivery in a distributed processing system are provided. Embodiments include receiving a plurality of events from one or more event producing components of the distributed processing system; creating, by an incident analyzer, in dependence upon the events a truth space representing events that make one or more conditional event processing rules true, the truth space including a set of truth points, each truth point including a set of events and a set of event locations; creating, by the incident analyzer, in dependence upon the truth space one or more alerts including assigning one of the locations of the truth space to one or more of the alerts; and sending, by the incident analyzer, the alerts to at least one component of the distributed processing system. | 04-18-2013 |
20130097216 | Selected Alert Delivery In A Distributed Processing System - Methods, apparatuses, and computer program products for selected alert delivery in a distributed processing system are provided. Embodiments include receiving a plurality of events from one or more event producing components of the distributed processing system; creating, by an incident analyzer, in dependence upon the events a truth space representing events that make one or more conditional event processing rules true, the truth space including a set of truth points, each truth point including a set of events and a set of event locations; creating, by the incident analyzer, in dependence upon the truth space one or more alerts; and sending, by the incident analyzer, the alerts to at least one component of the distributed processing system. | 04-18-2013 |
20130097272 | Prioritized Alert Delivery In A Distributed Processing System - Methods, apparatuses, and computer program products for prioritized alert delivery in a distributed processing system are provided. Embodiments include receiving a plurality of events from a plurality of tiered event producing components in a distributed computing system; identifying a plurality of potential alerts in dependence upon the events; wherein each potential alert includes a priority and one or more condition events describing the event producing component creating the event; comparing the potential alerts and their condition events and priorities; identifying the highest priority potential alert having condition events whose event producing component is in a tier that is higher than the tier of condition events of one or more lower priority potential alerts; and creating an alert in dependence upon the identified highest priority potential alert and the condition events corresponding to the identified highest priority potential alert. | 04-18-2013 |
20130097300 | ADMINISTERING INCIDENT POOLS FOR EVENT AND ALERT ANALYSIS - Administering incident pools including creating a pool of incidents, the pool having a predetermined initial period of time; assigning each received incident to the pool; assigning, by the incident analyzer, to each incident a predetermined minimum time for inclusion in a pool; extending for one or more of the incidents the predetermined initial period of time of the pool by a particular period of time assigned to the incident; determining whether conditions have been met to close the pool; and if conditions have been met to close the pool determining for each incident in the pool whether the incident has been in the pool for its predetermined minimum time for inclusion in a pool; and if the incident has not been in the pool for its predetermined minimum time, evicting the incident from the closed pool and including the incident in a next pool. | 04-18-2013 |
20130097310 | CONFIGURABLE ALERT DELIVERY IN A DISTRIBUTED PROCESSING SYSTEM - Configurable alert delivery in a distributed processing system include for each alert generated by an incident analyzer, applying active alert filters to the alert; wherein applying the active alert filters to the alert includes: creating a list of all active alert filters and a set of all active listeners; and for each active alert filter, running the active alert filter; if the active alert filter indicates that the alert should not go to one or more of the active listeners, removing the one or more active listeners from the set of all active listeners; if the active listeners set is empty, stopping processing of the alert; and if the active listeners set is not empty, selecting, by the incident analyzer, the next active alert filter from the active alert filter list. | 04-18-2013 |
20130097619 | Administering Incident Pools For Event And Alert Analysis - Administering incident pools including assigning an incident received from one or more components of the distributed processing system to a pool of incidents; assigning to each incident a particular combined minimum time for inclusion of the incident in the pool; in response to the pool closing, determining for each incident in the pool whether the incident has met its combined minimum time for inclusion in the pool; if the incident has been in the pool for its combined minimum time, including the incident in the closed pool; if the incident has not been in the pool for its combined minimum time, moving the incident from the closed pool to a next pool; applying incident suppression rules using the incidents assigned to the next pool; and applying incident creation rules to the incidents that were assigned to the next pool, while omitting any duplicate incidents caused by the assignment. | 04-18-2013 |
20130097620 | ADMINISTERING INCIDENT POOLS FOR EVENT AND ALERT ANALYSIS - Administering incident pools including assigning an incident received from one or more components of the distributed processing system to a pool of incidents; assigning to each incident a particular combined minimum time for inclusion of the incident in the pool; in response to the pool closing, determining for each incident in the pool whether the incident has met its combined minimum time for inclusion in the pool; if the incident has been in the pool for its combined minimum time, including the incident in the closed pool; if the incident has not been in the pool for its combined minimum time, moving the incident from the closed pool to a next pool; applying incident suppression rules using the incidents assigned to the next pool; and applying incident creation rules to the incidents that were assigned to the next pool, while omitting any duplicate incidents caused by the assignment. | 04-18-2013 |
20130111502 | Selected Alert Delivery In A Distributed Processing System | 05-02-2013 |
20130132460 | ADMINISTERING INCIDENT POOLS FOR EVENT AND ALERT ANALYSIS - Administering incident pools including receiving, by an incident analyzer from an incident queue, a plurality of incidents from one or more components of the distributed processing system; assigning, by the incident analyzer, each received incident to a pool of incidents; assigning, by the incident analyzer, to each incident a particular combined minimum time for inclusion in one or more pools, each particular combined minimum time corresponding to a particular incident; in response to the pool closing, determining, by the incident analyzer, for each incident in the pool whether the incident has met its combined minimum time for inclusion in one or more pools; and if the incident has been in the pool for its combined minimum time, including, by the incident analyzer, the incident in the closed pool; and if the incident has not been in the pool for its combined minimum time, including the incident in a next pool. | 05-23-2013 |
20130138809 | RELEVANT ALERT DELIVERY IN A DISTRIBUTED PROCESSING SYSTEM - Methods, systems and products are provided relevant alert delivery including assigning by an event analyzer each received event to an events pool; determining by the event analyzer in dependence upon event analysis rules and the events assigned to the events pool whether to suppress one or more of the events; identifying by the event analyzer in dependence upon event analysis rules and the events assigned to the events pool one or more alerts; sending by the event analyzer to an alert analyzer all the alerts identified by the event analyzer; assigning by the alert analyzer the identified alerts to an alerts pool; determining by the alert analyzer in dependence upon alert analysis rules and the alerts in the alert pool whether to suppress any alerts; and transmitting the unsuppressed alerts to one or more components of the distributed processing system. | 05-30-2013 |
20130144932 | SELECTED ALERT DELIVERY IN A DISTRIBUTED PROCESSING SYSTEM - Methods, apparatuses, and computer program products for selected alert delivery in a distributed processing system are provided. Embodiments include receiving a plurality of events from one or more event producing components of the distributed processing system; creating, by an incident analyzer, in dependence upon the events a truth space representing events that make one or more conditional event processing rules true, the truth space including a set of truth points, each truth point including a set of events and a set of event locations; creating, by the incident analyzer, in dependence upon the truth space one or more alerts including assigning one of the locations of the truth space to one or more of the alerts; and sending, by the incident analyzer, the alerts to at least one component of the distributed processing system. | 06-06-2013 |
20130166743 | Relevant Alert Delivery In A Distributed Processing System - Methods, systems and products are provided relevant alert delivery including assigning by an event analyzer each received event to an events pool; determining by the event analyzer in dependence upon event analysis rules and the events assigned to the events pool whether to suppress one or more of the events; identifying by the event analyzer in dependence upon event analysis rules and the events assigned to the events pool one or more alerts; sending by the event analyzer to an alert analyzer all the alerts identified by the event analyzer; assigning by the alert analyzer the identified alerts to an alerts pool; determining by the alert analyzer in dependence upon alert analysis rules and the alerts in the alert pool whether to suppress any alerts; and transmitting the unsuppressed alerts to one or more components of the distributed processing system. | 06-27-2013 |
20130179905 | Administering Incident Pools For Event And Alert Analysis - Administering incident pools including creating a pool of incidents, the pool having a predetermined initial period of time; assigning each received incident to the pool; assigning, by the incident analyzer, to each incident a predetermined minimum time for inclusion in a pool; extending for one or more of the incidents the predetermined initial period of time of the pool by a particular period of time assigned to the incident; determining whether conditions have been met to close the pool; and if conditions have been met to close the pool determining for each incident in the pool whether the incident has been in the pool for its predetermined minimum time for inclusion in a pool; and if the incident has not been in the pool for its predetermined minimum time, evicting the incident from the closed pool and including the incident in a next pool. | 07-11-2013 |
20130191851 | Monitoring Operating Parameters In A Distributed Computing System With Active Messages - In a distributed computing system including a nodes organized for collective operations: initiating, by a root node through an active message to all other nodes, a collective operation, the active message including an instruction to each node to store operating parameter data in each node's send buffer; and, responsive to the active message: storing, by each node, the node's operating parameter data in the node's send buffer and returning, by the node, the operating parameter data as a result of the collective operation. | 07-25-2013 |
20130212145 | Initiating A Collective Operation In A Parallel Computer - Initiating a collective operation in a parallel computer that includes compute nodes coupled for data communications and organized in an operational group for collective operations with one compute node assigned as a root node, including: identifying, by a non-root compute node, a collective operation to execute in the operational group of compute nodes; initiating, by the non-root compute node, execution of the collective operation amongst the compute nodes of the operational group including: sending, by the non-root compute node to one or more of the other compute nodes in the operational group, an active message, the active message including information configured to initiate execution of the collective operation amongst the compute nodes of the operational group; and executing, by the compute nodes of the operational group, the collective operation. | 08-15-2013 |
20130212555 | Developing A Collective Operation For Execution In A Parallel Computer - Developing a collective operation for execution in a parallel computer that includes compute nodes coupled for data communications, including: receiving, by a collective development tool, a specification of a target collective operation to develop; receiving, by the collective development tool, a specification of computer hardware characteristics of the parallel computer within which the target collective operation will be executed; selecting, by the collective development tool automatically without user interaction, iteratively for each stage of the target collective operation, a collective primitive in dependence upon the specification of computer hardware characteristics and a predefined set of rules specifying selection criteria of collective primitives based on computer hardware characteristics; and generating, by the collective development tool, the target collective operation in dependence upon the selected collective primitives. | 08-15-2013 |
20130212558 | Developing Collective Operations For A Parallel Computer - Developing collective operations for a parallel computer that includes compute nodes includes: presenting, by a collective development tool, a graphical user interface (‘GUI’) to a collective developer; receiving, by the collective development tool from the collective developer through the GUI, a selection of one or more collective primitives; receiving, by the collective development tool from the collective developer through the GUI, a specification of a serial order of the collective primitives and a specification of input and output buffers for each collective primitive; and generating, by the collective development tool in dependence upon the selection of collective primitives, the serial order of the collective primitives, and the input and output buffers for each collective primitive, executable code that carries out the collective operation specified by the collective primitives. | 08-15-2013 |
20130212561 | DEVELOPING COLLECTIVE OPERATIONS FOR A PARALLEL COMPUTER - Developing collective operations for a parallel computer that includes compute nodes includes: presenting, by a collective development tool, a graphical user interface (‘GUI’) to a collective developer; receiving, by the collective development tool from the collective developer through the GUI, a selection of one or more collective primitives; receiving, by the collective development tool from the collective developer through the GUI, a specification of a serial order of the collective primitives and a specification of input and output buffers for each collective primitive; and generating, by the collective development tool in dependence upon the selection of collective primitives, the serial order of the collective primitives, and the input and output buffers for each collective primitive, executable code that carries out the collective operation specified by the collective primitives. | 08-15-2013 |
20130219410 | Processing Unexpected Messages At A Compute Node Of A Parallel Computer - Methods, apparatuses, and computer program products for processing unexpected messages at a compute node of a parallel computer are provided. Embodiments include receiving, by the compute node, a portion of a message from another compute node of the parallel computer, the message comprising a plurality of separate portions; in response to receiving the portion of the message, determining, by the compute node, whether one of the applications executing on the compute node, has indicated that the message is expected; if one of the applications executing on the compute node has not indicated that the message is expected, storing, by the compute node, the portion of the message in an unexpected message buffer within the compute node; and if one of the applications executing on the compute node has indicated that the message is expected, storing the portion of the message at a storage destination indicated by the message. | 08-22-2013 |
20130305103 | RELEVANT ALERT DELIVERY IN A DISTRIBUTED PROCESSING SYSTEM WITH EVENT LISTENERS AND ALERT LISTENERS - Relevant alert delivery including determining, by an events listener associated with an event queue, whether one or more events in an events queue have not been assigned to any events pool by any event analyzer; and if one or more events in the events queue have not been assigned to any events pool, identifying by the events listener in dependence upon the event analysis rules one or more alerts; sending by the event listener to an alerts queue all the alerts identified by the event listener; the alerts queue having an associated alerts listener; determining whether one or more alerts in the alerts queue have not been assigned to any alert pool; if one or more alerts in the alerts queue have not been assigned to any alerts pool, and determining in dependence upon alert analysis rules whether to suppress the alerts; and transmitting the unsuppressed alerts. | 11-14-2013 |
20130318404 | DYNAMIC ADMINISTRATION OF COMPONENT EVENT REPORTING IN A DISTRIBUTED PROCESSING SYSTEM - Methods, systems and products are provided for dynamic administration of component event reporting in a distributed processing system including receiving, by an events analyzer from an events queue, a plurality of events from one or more components of the distributed processing system; determining, by the events analyzer in dependence upon the received events and one or more event analysis rules, to change the event reporting rules of one or more components; and instructing, by the events analyzer, the one or more components to change the event reporting rules. | 11-28-2013 |
20140040673 | Administering Incident Pools For Incident Analysis - Methods, apparatuses, and computer program products for administering incident pools for incident analysis in a distributed processing system are provided. Embodiments include an incident analyzer receiving a plurality of incidents from an incident queue. The incident analyzer also assigns each received incident to an incident pool having a predetermined initial period of time. The predetermined initial period of time is the time within which the incident pool is open to the assignment of incidents. The incident analyzer calculates an arrival rate that incidents are assigned to the incident pool. The incident analyzer also extends based on the arrival rate, for each incident assigned to the incident pool, the predetermined initial period of time by a particular period of time. | 02-06-2014 |
20140047273 | Administering Checkpoints For Incident Analysis - Methods, apparatuses, and computer program products for administering checkpoints for incident analysis are provided. Embodiments include a checkpoint manager receiving from each incident analyzer of a plurality of incident analyzers, a checkpoint indicating an incident having the oldest identification number still in analysis by the incident analyzer at the time associated with the checkpoint. The checkpoint manager examines each received checkpoint to identify, as a restore incident, an incident having the oldest identification number indicated in any of the received checkpoints. A monitor sends to the incident analyzers, a stream of incidents beginning with the identified restore incident and continuing with any incidents having a newer identification number than the identified restore incident. Each incident analyzer processes from the stream of incidents only the incident indicated in the last checkpoint of the incident analyzer and any subsequent incidents having a newer identification number than the indicated incident. | 02-13-2014 |
20140068347 | RESTARTING EVENT AND ALERT ANALYSIS AFTER A SHUTDOWN IN A DISTRIBUTED PROCESSING SYSTEM - Restarting event and alert analysis after a shutdown in a distributed processing system includes identifying a shutdown condition of the distributed processing system; and determining whether the shutdown was a planned shutdown or an unplanned shutdown; if the shutdown was planned, storing an identification of the last event in an event log that was injected in an event queue at the time of the planned shutdown and restarting event and alert analysis using the next event identified in the event log; if the shutdown was unplanned, identifying a previously configured restart mode; selecting an identification of a restart event in the event log according to the previously configured restart mode; and restarting event and alert analysis using the restart event identified in the event log. | 03-06-2014 |
20140101307 | DYNAMIC ADMINISTRATION OF EVENT POOLS FOR RELEVANT EVENT AND ALERT ANALYSIS DURING EVENT STORMS - Dynamic administration of event pools for relevant event and alert analysis during event storms including receiving, by an events analyzer from an events queue, a plurality of events from one or more components of the distributed processing system, each event including an occurred time and a logged time; creating, by the event analyzer, an events pool; determining whether an arrival rate of the events from the components of the distributed processing system is greater than a predetermined threshold; if the arrival rate is greater than the predetermined threshold, assigning, by the events analyzer, a plurality of events to the events pool in dependence upon their occurred time; and if the arrival rate is not greater than the predetermined threshold, assigning, by the events analyzer, a plurality of events to the events pool in dependence upon their logged time. | 04-10-2014 |
20140164592 | DETERMINING A SYSTEM CONFIGURATION FOR PERFORMING A COLLECTIVE OPERATION ON A PARALLEL COMPUTER - Determining a system configuration for performing a collective operation on a parallel computer that includes a plurality of compute nodes, the compute nodes coupled for data communications over a data communications network, including: selecting a system configuration on the parallel computer for executing the collective operation; executing the collective operation on the selected system configuration on the parallel computer; determining performance metrics associated with executing the collective operation on the selected system configuration on the parallel computer; selecting, using a simulated annealing algorithm, a plurality of test system configurations on the parallel computer for executing the collective operation, wherein the simulated annealing algorithm specifies a similarity threshold between a plurality of system configurations; executing, the collective operation on each of the test system configurations; and determining performance metrics associated with executing the collective operation on each of the test system configurations. | 06-12-2014 |
20140164600 | DETERMINING A SYSTEM CONFIGURATION FOR PERFORMING A COLLECTIVE OPERATION ON A PARALLEL COMPUTER - Determining a system configuration for performing a collective operation on a parallel computer that includes a plurality of compute nodes, the compute nodes coupled for data communications over a data communications network, including: selecting a system configuration on the parallel computer for executing the collective operation; executing the collective operation on the selected system configuration on the parallel computer; determining performance metrics associated with executing the collective operation on the selected system configuration on the parallel computer; selecting, using a simulated annealing algorithm, a plurality of test system configurations on the parallel computer for executing the collective operation, wherein the simulated annealing algorithm specifies a similarity threshold between a plurality of system configurations; executing, the collective operation on each of the test system configurations; and determining performance metrics associated with executing the collective operation on each of the test system configurations. | 06-12-2014 |
20140165075 | EXECUTING A COLLECTIVE OPERATION ALGORITHM IN A PARALLEL COMPUTER - Executing a collective operation algorithm in a parallel computer includes a compute node of an operational group determining a required number of participants for execution of a collective operation algorithm and determining a number of contributing nodes having data to participate in the algorithm. Embodiments also include the compute node calculating a number of ghost nodes to participate in the algorithm. According to embodiments of the present invention, the number of ghost nodes is the required number of participants minus the number of contributing nodes having data to participate. Embodiments also include the compute node selecting from a plurality of ghost nodes, the calculated number of ghost nodes for participation in the execution of the algorithm and executing the algorithm with both the selected ghost nodes and the contributing nodes. | 06-12-2014 |
20140165076 | EXECUTING A COLLECTIVE OPERATION ALGORITHM IN A PARALLEL COMPUTER - Executing a collective operation algorithm in a parallel computer includes a compute node of an operational group determining a required number of participants for execution of a collective operation algorithm and determining a number of contributing nodes having data to participate in the algorithm. Embodiments also include the compute node calculating a number of ghost nodes to participate in the algorithm. According to embodiments of the present invention, the number of ghost nodes is the required number of participants minus the number of contributing nodes having data to participate. Embodiments also include the compute node selecting from a plurality of ghost nodes, the calculated number of ghost nodes for participation in the execution of the algorithm and executing the algorithm with both the selected ghost nodes and the contributing nodes. | 06-12-2014 |
20140172938 | SELECTED ALERT DELIVERY IN A DISTRIBUTED PROCESSING SYSTEM - Methods, apparatuses, and computer program products for selected alert delivery in a distributed processing system are provided. Embodiments include receiving a plurality of events from one or more event producing components of the distributed processing system; creating, by an incident analyzer, in dependence upon the events a truth space representing events that make one or more conditional event processing rules true, the truth space including a set of truth points, each truth point including a set of events and a set of event locations; creating, by the incident analyzer, in dependence upon the truth space one or more alerts; and sending, by the incident analyzer, the alerts to at least one component of the distributed processing system. | 06-19-2014 |
20140173201 | ACQUIRING REMOTE SHARED VARIABLE DIRECTORY INFORMATION IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for acquiring remote shared variable directory (SVD) information in a parallel computer are provided. Embodiments include a runtime optimizer determining that a first thread of a first task requires shared resource data stored in a memory partition corresponding to a second thread of a second task. Embodiments also include the runtime optimizer requesting from the second thread, in response to determining that the first thread of the first task requires the shared resource data, SVD information associated with the shared resource data. Embodiments also include the runtime optimizer receiving from the second thread, the SVD information associated with the shared resource data. | 06-19-2014 |
20140173204 | ANALYZING UPDATE CONDITIONS FOR SHARED VARIABLE DIRECTORY INFORMATION IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for analyzing update conditions for shared variable directory (SVD) information in a parallel computer are provided. Embodiments include a runtime optimizer receiving a compare-and-swap operation header. The compare-and-swap operation header includes an SVD key, a first SVD address, and an updated first SVD address. The first SVD address is associated with the SVD key in a first SVD associated with a first task. Embodiments also include the runtime optimizer retrieving from a remote address cache associated with the second task, a second SVD address indicating a location within a memory partition associated with the first SVD in response to receiving the compare-and-swap operation header. Embodiments also include the runtime optimizer determining whether the second SVD address matches the first SVD address and transmitting a result indicating whether the second SVD address matches the first SVD address. | 06-19-2014 |
20140173205 | ANALYZING UPDATE CONDITIONS FOR SHARED VARIABLE DIRECTORY INFORMATION IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for analyzing update conditions for shared variable directory (SVD) information in a parallel computer are provided. Embodiments include a runtime optimizer receiving a compare-and-swap operation header. The compare-and-swap operation header includes an SVD key, a first SVD address, and an updated first SVD address. The first SVD address is associated with the SVD key in a first SVD associated with a first task. Embodiments also include the runtime optimizer retrieving from a remote address cache associated with the second task, a second SVD address indicating a location within a memory partition associated with the first SVD in response to receiving the compare-and-swap operation header. Embodiments also include the runtime optimizer determining whether the second SVD address matches the first SVD address and transmitting a result indicating whether the second SVD address matches the first SVD address. | 06-19-2014 |
20140173212 | ACQUIRING REMOTE SHARED VARIABLE DIRECTORY INFORMATION IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for acquiring remote shared variable directory (SVD) information in a parallel computer are provided. Embodiments include a runtime optimizer determining that a first thread of a first task requires shared resource data stored in a memory partition corresponding to a second thread of a second task. Embodiments also include the runtime optimizer requesting from the second thread, in response to determining that the first thread of the first task requires the shared resource data, SVD information associated with the shared resource data. Embodiments also include the runtime optimizer receiving from the second thread, the SVD information associated with the shared resource data. | 06-19-2014 |
20140173257 | REQUESTING SHARED VARIABLE DIRECTORY (SVD) INFORMATION FROM A PLURALITY OF THREADS IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for requesting shared variable directory (SVD) information from a plurality of threads in a parallel computer are provided. Embodiments include a runtime optimizer detecting that a first thread requires a plurality of updated SVD information associated with shared resource data stored in a plurality of memory partitions. Embodiments also include a runtime optimizer broadcasting, in response to detecting that the first thread requires the updated SVD information, a gather operation message header to the plurality of threads. The gather operation message header indicates an SVD key corresponding to the required updated SVD information and a local address associated with the first thread to receive a plurality of updated SVD information associated with the SVD key. Embodiments also include the runtime optimizer receiving at the local address, the plurality of updated SVD information from the plurality of threads. | 06-19-2014 |
20140173604 | CONDITIONALLY UPDATING SHARED VARIABLE DIRECTORY (SVD) INFORMATION IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for conditionally updating shared variable directory (SVD) information in a parallel computer are provided. Embodiments include a runtime optimizer receiving a broadcast reduction operation header. The broadcast reduction operation header includes an SVD key and a first SVD address. The first SVD address is associated with the SVD key in a first SVD associated with a first task. Embodiments also include the runtime optimizer retrieving from a remote address cache associated with the second task, a second SVD address indicating a location within a memory partition associated with the first SVD, in response to receiving the broadcast reduction operation header. Embodiments also include the runtime optimizer determining that the first SVD address does not match the second SVD address and updating the remote address cache with the first SVD address. | 06-19-2014 |
20140173615 | CONDITIONALLY UPDATING SHARED VARIABLE DIRECTORY (SVD) INFORMATION IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for conditionally updating shared variable directory (SVD) information in a parallel computer are provided. Embodiments include a runtime optimizer receiving a broadcast reduction operation header. The broadcast reduction operation header includes an SVD key and a first SVD address. The first SVD address is associated with the SVD key in a first SVD associated with a first task. Embodiments also include the runtime optimizer retrieving from a remote address cache associated with the second task, a second SVD address indicating a location within a memory partition associated with the first SVD, in response to receiving the broadcast reduction operation header. Embodiments also include the runtime optimizer determining that the first SVD address does not match the second SVD address and updating the remote address cache with the first SVD address. | 06-19-2014 |
20140173626 | BROADCASTING SHARED VARIABLE DIRECTORY (SVD) INFORMATION IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for broadcasting shared variable directory (SVD) information in a parallel computer are provided. Embodiments include a runtime optimizer detecting, by a runtime optimizer of the parallel computer, a change in SVD information within an SVD associated with a first thread. Embodiments also include a runtime optimizer identifying a plurality of threads requiring notification of the change in the SVD information. Embodiments also include the runtime optimizer in response to detecting the change in the SVD information, broadcasting to each thread of the identified plurality of threads, a broadcast message header and update data indicating the change in the SVD information. | 06-19-2014 |
20140173627 | REQUESTING SHARED VARIABLE DIRECTORY (SVD) INFORMATION FROM A PLURALITY OF THREADS IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for requesting shared variable directory (SVD) information from a plurality of threads in a parallel computer are provided. Embodiments include a runtime optimizer detecting that a first thread requires a plurality of updated SVD information associated with shared resource data stored in a plurality of memory partitions. Embodiments also include a runtime optimizer broadcasting, in response to detecting that the first thread requires the updated SVD information, a gather operation message header to the plurality of threads. The gather operation message header indicates an SVD key corresponding to the required updated SVD information and a local address associated with the first thread to receive a plurality of updated SVD information associated with the SVD key. Embodiments also include the runtime optimizer receiving at the local address, the plurality of updated SVD information from the plurality of threads. | 06-19-2014 |
20140173629 | BROADCASTING SHARED VARIABLE DIRECTORY (SVD) INFORMATION IN A PARALLEL COMPUTER - Methods, parallel computers, and computer program products for broadcasting shared variable directory (SVD) information in a parallel computer are provided. Embodiments include a runtime optimizer detecting, by a runtime optimizer of the parallel computer, a change in SVD information within an SVD associated with a first thread. Embodiments also include a runtime optimizer identifying a plurality of threads requiring notification of the change in the SVD information. Embodiments also include the runtime optimizer in response to detecting the change in the SVD information, broadcasting to each thread of the identified plurality of threads, a broadcast message header and update data indicating the change in the SVD information. | 06-19-2014 |
20140192652 | TOKEN-BASED FLOW CONTROL OF MESSAGES IN A PARALLEL COMPUTER - Token-based flow control of messages in a parallel computer, the parallel computer including a plurality of compute nodes, each compute node including one or more computer processors, including: allocating, by a token administration module to a plurality of the computer processors in the parallel computer, a number of data communications tokens; identifying all communicators executing on each computer processor, where each communicator is participating in a distinct parallel operation executing on the parallel computer; allocating, to the communicators, the data communications tokens; determining, by a communicator attempting to send data to the destination, whether the communicator has enough available data communications tokens to send the data to the destination; and responsive to determining that the communicator has enough available data communications tokens to send the data, sending, by the communicator, the data to the destination. | 07-10-2014 |
20140195688 | TOKEN-BASED FLOW CONTROL OF MESSAGES IN A PARALLEL COMPUTER - Token-based flow control of messages in a parallel computer, the parallel computer including a plurality of compute nodes, each compute node including one or more computer processors, including: allocating, by a token administration module to a plurality of the computer processors in the parallel computer, a number of data communications tokens; identifying all communicators executing on each computer processor, where each communicator is participating in a distinct parallel operation executing on the parallel computer; allocating, to the communicators, the data communications tokens; determining, by a communicator attempting to send data to the destination, whether the communicator has enough available data communications tokens to send the data to the destination; and responsive to determining that the communicator has enough available data communications tokens to send the data, sending, by the communicator, the data to the destination. | 07-10-2014 |
20140244974 | Background Collective Operation Management In A Parallel Computer - Background collective operation management in a parallel computer, the parallel computer including one or more compute nodes operatively coupled for data communications over one or more data communications networks, including: determining, by a management availability module, whether a compute node in the parallel computer is available to perform a background collective operation management task; responsive to determining that the compute node is available to perform the background collective operation management task, determining, by the management availability module, whether the compute node has access to sufficient resources to perform the background collective operation management task; and responsive to determining that the compute node has access to sufficient resources to perform the background collective operation management task, initiating, by the management availability module, execution of the background collective operation management task. | 08-28-2014 |
20140245316 | Background Collective Operation Management In A Parallel Computer - Background collective operation management in a parallel computer, the parallel computer including one or more compute nodes operatively coupled for data communications over one or more data communications networks, including: determining, by a management availability module, whether a compute node in the parallel computer is available to perform a background collective operation management task; responsive to determining that the compute node is available to perform the background collective operation management task, determining, by the management availability module, whether the compute node has access to sufficient resources to perform the background collective operation management task; and responsive to determining that the compute node has access to sufficient resources to perform the background collective operation management task, initiating, by the management availability module, execution of the background collective operation management task. | 08-28-2014 |
20140258417 | Collective Operation Management In A Parallel Computer - Methods, apparatuses, and computer program products for collective operation management in a parallel computer are provided. Embodiments include a parallel computer having a first compute node operatively coupled for data communications over a tree data communications network with a plurality of child compute nodes. Embodiments also include each child compute node performing a first collective operation. The first compute rode, for each child compute node, receives from the child compute node, a result of the first collective operation performed by the child compute node. For each result received from a child compute node, the first compute node stores a timestamp indicating a time that the child compute node completed the first collective operation. The first compute node also manages, based on the stored timestamps, execution of a second collective operation over the tree data communications network. | 09-11-2014 |
20140258538 | Collective Operation Management In A Parallel Computer - Methods, apparatuses, and computer program products for collective operation management in a parallel computer are provided. Embodiments include a parallel computer having a first compute node operatively coupled for data communications over a tree data communications network with a plurality of child compute nodes. Embodiments also include each child compute node performing a first collective operation. The first compute rode, for each child compute node, receives from the child compute node, a result of the first collective operation performed by the child compute node. For each result received from a child compute node, the first compute node stores a timestamp indicating a time that the child compute node completed the first collective operation. The first compute node also manages, based on the stored timestamps, execution of a second collective operation over the tree data communications network. | 09-11-2014 |
20140258746 | Collective Operation Management In A Parallel Computer - Methods, apparatuses, and computer program products for collective operation management in a parallel computer are provided. Embodiments include a parallel computer having a first compute node operatively coupled for data communications over a tree data communications network with a plurality of child compute nodes. Embodiments also include each child compute node performing a first collective operation. The first compute rode, for each child compute node, receives from the child compute node, a result of the first collective operation performed by the child compute node. In response to receiving at least one result, the first compute node reduces a power consumption level of the child compute node. | 09-11-2014 |
20140258748 | Collective Operation Management In A Parallel Computer - Methods, apparatuses, and computer program products for collective operation management in a parallel computer are provided. Embodiments include a parallel computer having a first compute node operatively coupled for data communications over a tree data communications network with a plurality of child compute nodes. Embodiments also include each child compute node performing a first collective operation. The first compute rode, for each child compute node, receives from the child compute node, a result of the first collective operation performed by the child compute node. In response to receiving at least one result, the first compute node reduces a power consumption level of the child compute node. | 09-11-2014 |
20140280601 | Collective Operation Management In A Parallel Computer - Methods, apparatuses, and computer program products for collective operation management in a parallel computer are provided. Embodiments include a parallel computer having a plurality of compute nodes coupled for data communications over a data communications network. Embodiments include a first compute node entering a collective operation. Each compute node of the plurality of compute nodes is associated with the collective operation. In response to entering the collective operation, the first compute node decreases power consumption of the first compute node. | 09-18-2014 |
20140280820 | Collective Operation Management In A Parallel Computer - Methods, apparatuses, and computer program products for collective operation management in a parallel computer are provided. Embodiments include a parallel computer having a plurality of compute nodes coupled for data communications over a data communications network. Embodiments include a first compute node entering a collective operation. Each compute node of the plurality of compute nodes is associated with the collective operation. In response to entering the collective operation, the first compute node decreases power consumption of the first compute node. | 09-18-2014 |
20140281723 | Algorithm Selection For Collective Operations In A Parallel Computer - Algorithm selection for collective operations in a parallel computer that includes a plurality of compute nodes may include: profiling a plurality of algorithms for each of a set of collective operations, including for each collective operation: executing the operation a plurality times with each execution varying one or more of: geometry, message size, data type, and algorithm to effect the collective operation, thereby generating performance metrics for each execution; storing the performance metrics in a performance profile; at load time of a parallel application including a plurality of parallel processes configured in a particular geometry, filtering the performance profile in dependence upon the particular geometry; during run-time of the parallel application, selecting, for at least one collective operation, an algorithm to effect the operation in dependence upon characteristics of the parallel application and the performance profile; and executing the operation using the selected algorithm. | 09-18-2014 |
20140282429 | Algorithm Selection For Collective Operations In A Parallel Computer - Algorithm selection for collective operations in a parallel computer that includes a plurality of compute nodes may include: profiling a plurality of algorithms for each of a set of collective operations, including for each collective operation: executing the operation a plurality times with each execution varying one or more of: geometry, message size, data type, and algorithm to effect the collective operation, thereby generating performance metrics for each execution; storing the performance metrics in a performance profile; at load time of a parallel application including a plurality of parallel processes configured in a particular geometry, filtering the performance profile in dependence upon the particular geometry; during run-time of the parallel application, selecting, for at least one collective operation, an algorithm to effect the operation in dependence upon characteristics of the parallel application and the performance profile; and executing the operation using the selected algorithm. | 09-18-2014 |
20150033243 | PARALLEL INCIDENT PROCESSING - Methods, apparatuses, and computer program products for parallel incident processing are provided. Embodiments include an incident analyzer identifying a pool of incidents and distributing the incidents across a plurality of threads of the incident analyzer. One or more threads of the plurality of threads of the incident analyzer generate a tuple indicating a rule identification and a rule state. The incident analyzer also identifies from the generated tuples, tuples that have the same rule identification and generates a merged tuple by merging the rule state of each of the identified tuples that have the same rule identification. | 01-29-2015 |
20150058657 | ADAPTIVE CLOCK THROTTLING FOR EVENT PROCESSING - Methods, apparatuses, and computer program products for adaptive clock throttling for event processing are provided. Embodiments include an event processing system receiving a plurality of events from one or more components of the distributed processing system. Embodiments also include the event processing system determining that an arrival attribute of the plurality of events exceeds an arrival threshold. Embodiments also include the event processing system, adjusting, in response to determining that the arrival attribute of the plurality of events exceeds the arrival threshold, a clock speed of at least one of the event processing system and a component of the distributed processing system. | 02-26-2015 |
20150058676 | Determining Whether To Send An Alert In A Distributed Processing System - Methods, apparatuses, and computer program products for determining whether to send an alert are provided. Embodiments include a voting manager receiving from a plurality of alert analyzers, one or more delivery codes associated with an alert. In dependence upon the one or more delivery codes, the voting manager determines whether to suppress the alert, to close the alert, or to report the alert. | 02-26-2015 |
20150063100 | Data Communications In A Distributed Computing Environment - Data communications may be carried out in a distributed computing environment that includes computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’). Such data communications may be carried out by: issuing, by a sender to a receiver, an eager SEND data communications instruction to transfer SEND data, the instruction including information describing data location at the sender and data size; transmitting, by the sender to the receiver, the SEND data as eager data packets; discarding, by the receiver in dependence upon data flow conditions, eager data packets as they are received from the sender; and transferring, in dependence upon the data flow conditions, by the receiver from the sender's data location to a receive buffer by remote direct memory access (“RDMA”), the SEND data. | 03-05-2015 |
20150067067 | Data Communications In A Distributed Computing Environment - Data communications may be carried out in a distributed computing environment that includes a plurality of computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’). In such an environment, data communications may include: issuing, by a sender to a receiver, an eager SEND data communications instruction to transfer SEND data, the instruction including information describing a location and size of a send buffer in which the SEND data is stored; transmitting, by the sender to the receiver, the SEND data as eager data packets; issuing, by the receiver to the sender in dependence upon data flow conditions, a STOP instruction, the STOP instruction including an order to stop transmitting the eager data packets; and transferring the SEND data by the receiver from the sender's data location to a receive buffer by remote direct memory access (“RDMA”). | 03-05-2015 |
20150067068 | Data Communications In A Distributed Computing Environment - Data communications may be carried out in a distributed computing environment that includes a plurality of computers coupled for data communications through communications adapters and an active messaging interface (‘AMI’). In distributed computing environment, data communications may include: receiving in the AMI from an application an eager SEND instruction that describes the location and size of send data in an application SEND buffer; copying by the AMI the send data from the application SEND buffer to a temporary AMI buffer; advising the application of completion of the SEND instruction before sending the SEND data to the receiver; and after advising the application of completion of the SEND instruction, sending the SEND data by the sender to the receiver. | 03-05-2015 |
20150074164 | EVENT AND ALERT ANALYSIS IN A DISTRIBUTED PROCESSING SYSTEM - Methods, apparatuses, and computer program products for event and alert analysis are provided. Embodiments include a local event analyzer embedded in an alert analyzer receiving events from an event queue. Embodiments also include the local event analyzer creating, based on the received events and local event analysis rules specific to the alert analyzer, a temporary alert for the alert analyzer. Embodiments also include the alert analyzer analyzing the temporary alert based on alert analysis rules. | 03-12-2015 |
20150074472 | Checkpointing For Delayed Alert Creation - Methods, apparatuses, and computer program products for checkpointing for delayed alert creation are provided. Embodiments include applying a checkpoint to an events pool having events with corresponding alerts that have been generated and not delivered and following a crash and loss of the corresponding alerts not recorded in an alert database, generating new alerts based on the events in the events pool having the checkpoint. In response to completing processing of a new alert, embodiments include determining whether the alert database has an entry corresponding to the processed new alert. If the alert database has an entry corresponding to the processed new alert, embodiments include delivering the processed new alert without reporting the processed new alert to the alert database. If the alert database does not have an entry corresponding to the processed new alert, embodiments include reporting the processed new alert to an alert database. | 03-12-2015 |