Patent application number | Description | Published |
20080270640 | METHOD AND APPARATUS FOR ADAPTIVE IN-OPERATOR LOAD SHEDDING - One embodiment of the present method and apparatus adaptive in-operator load shedding includes receiving at least two data streams (each comprising a plurality of tuples, or data items) into respective sliding windows of memory. A throttling fraction is then calculated based on input rates associated with the data streams and on currently available processing resources. Tuples are then selected for processing from the data streams in accordance with the throttling fraction, where the selected tuples represent a subset of all tuples contained within the sliding window. | 10-30-2008 |
20080281626 | Enabling Interoperability Between Participants in a Network - Interoperability is enabled between participants in a network by determining values associated with a value metric defined for at least a portion of the network. Information flow is directed between two or more of the participants based at least in part on semantic models corresponding to the participants and on the values associated with the value metric. The semantic models may define interactions between the participants and define at least a portion of information produced or consumed by the participants. The determination of the values and the direction of the information flow may be performed multiple times in order to modify the one or more value metrics. The direction of information flow may allow participants to be deleted from the network, may allow participants to be added to the network, or may allow behavior of the participants to be modified. | 11-13-2008 |
20090049187 | METHOD AND APPARATUS FOR ADAPTIVE LOAD SHEDDING - One embodiment of the present method and apparatus adaptive load shedding includes receiving at least one data stream (comprising a plurality of tuples, or data items) into a first sliding window of memory. A subset of tuples from the received data stream is then selected for processing in accordance with at least one data stream operation, such as a data stream join operation. Tuples that are not selected for processing are ignored. The number of tuples selected and the specific tuples selected depend at least in part on a variety of dynamic parameters, including the rate at which the data stream (and any other processed data streams) is received, time delays associated with the received data stream, a direction of a join operation performed on the data stream and the values of the individual tuples with respect to an expected output. | 02-19-2009 |
20090086755 | SYSTEMS AND METHODS FOR CORRELATION OF BURST EVENTS AMONG DATA STREAMS - Systems and methods for the identification of correlated burst events among two or more data streams, given one or more specific query time spans are disclosed. Also broadly contemplated is the act of finding, from one or more data streams, those streams that have correlated burst events with another given data stream within a time span. | 04-02-2009 |
20090100014 | Methods and Apparatus for Adaptive Source Filtering and Load Shedding for Data Stream Processing - Techniques are disclosed for adaptive source filtering and load shedding in such data stream processing systems. For example, in one aspect of the invention, a method for use in filtering data in a distributed data stream processing system, wherein a server receives and processes one or more data streams from one or more data sources, comprises the steps of the server periodically re-configuring one or more filters and sending the one or more periodically re-configured filters to the one or more data sources, and the one or more data sources performing data filtering based on the one or more periodically re-configured filters received from the server. | 04-16-2009 |
20090300615 | METHOD FOR GENERATING A DISTRIBUTED STREAM PROCESSING APPLICATION - Techniques for generating a distributed stream processing application are provided. The techniques include obtaining a declarative description of one or more data stream processing tasks, wherein the declarative description expresses at least one stream processing task, and generating one or more execution units from the declarative description of one or more data stream processing tasks, wherein the one or more execution units are deployable across one or more distributed computing nodes, and comprise a distributed data stream processing application. | 12-03-2009 |
20090313614 | METHOD FOR HIGH-PERFORMANCE DATA STREAM PROCESSING - Techniques for optimizing data stream processing are provided. The techniques include employing a pattern, wherein the pattern facilitates splitting of one or more incoming streams and distributing processing across one or more operators, obtaining one or mote operators, wherein the one or more operators support at least one group-independent aggregation and join operation on one or more streams, generating code, wherein the code facilitates mapping of the application onto a computational infrastructure to enable workload partitioning, using the one or more operators to decompose each of the application into one or more granular components, and using the code to reassemble the one or more granular components into one or more deployable blocks to map the application to a computational infrastructure, wherein reassembling the one or more granular components to map the application to the computational infrastructure optimizes data stream processing of the application. | 12-17-2009 |
20100242042 | Method and apparatus for scheduling work in a stream-oriented computer system - An apparatus and method for scheduling stream-based applications in a distributed computer system includes a scheduler configured to schedule work using three temporal levels. Each temporal level includes a method. A macro method is configured to schedule jobs that will run, in a highest temporal level, in accordance with a plurality of operation constraints to optimize importance of work. A micro method is configured to fractionally allocate, at a medium temporal level, processing elements to processing nodes in the system to react to changing importance of the work. A nano method is configured to revise, at a lowest temporal level, fractional allocations on a continual basis. | 09-23-2010 |
20100281028 | SYSTEM AND METHOD FOR INDEXING A DATA STREAM - There are provided methods, computer program products, and systems for indexing a data stream. A method for indexing a data stream having attribute values includes the steps of parsing the data stream, and forming an index of tuples for a subset of attribute values of the data stream. The index is configured for retrieving the top-K tuples that optimize linearly weighted sums of at least some of the attribute values in the subset. | 11-04-2010 |
20100292980 | APPLICATION RESOURCE MODEL COMPOSITION FROM CONSTITUENT COMPONENTS - Techniques for composing an application resource model in a data stream processing system are disclosed. The application resource model may be used to understand what resources will be consumed by an application when executed by the data stream processing system. For example, a method for composing an application resource model for a data stream processing system comprises the following steps. One or more operator-level metrics are obtained from an execution of a data stream processing application in accordance with a first configuration. The application is executed by one or more nodes of the data stream processing system, and the application is comprised of one or more processing elements that are comprised of one or more operators. One or more operator-level resource functions are generated based on the obtained one or more operator-level metrics. A processing element-level resource function is generated based on the one or more generated operator-level resource functions. The processing element-level resource function represents an application resource model usable for predicting one or more characteristics of the application executed in accordance with a second configuration. | 11-18-2010 |
20100293283 | ON-DEMAND MARSHALLING AND DE-MARSHALLING OF NETWORK MESSAGES - One embodiment of a method for on-demand de-marshalling of a network message includes receiving a first network message at a receiver device, wherein the first network message was sent from a sender device, and wherein the first network message comprises one or more attributes that characterize an object at the sender device, and further wherein the first network message is in an encoded format according to a network protocol, storing the first network message in a buffer of the receiver device in the encoded format, identifying, at the receiver device and prior to de-marshalling of any of the attributes, that at least one of the attributes is to be manipulated by the receiver device, de-marshalling, in response to the identifying, the identified attribute(s), and manipulating the identified attribute(s) at the receiver device. | 11-18-2010 |
20100293532 | FAILURE RECOVERY FOR STREAM PROCESSING APPLICATIONS - In one embodiment, the invention is a method and apparatus for failure recovery for stream processing applications. One embodiment of a method for providing a failure recovery mechanism for a stream processing application includes receiving source code for the stream processing application, wherein the source code defines a fault tolerance policy for each of the components of the stream processing application, and wherein respective fault tolerance policies defined for at least two of the plurality of components are different, generating a sequence of instructions for converting the state(s) of the component(s) into a checkpoint file comprising a sequence of storable bits on a periodic basis, according to a frequency defined in the fault tolerance policy, initiating execution of the stream processing application, and storing the checkpoint file, during execution of the stream processing application, at a location that is accessible after failure recovery. | 11-18-2010 |
20100293533 | INCREMENTALLY CONSTRUCTING EXECUTABLE CODE FOR COMPONENT-BASED APPLICATIONS - One embodiment of a method for constructing executable code for a component-based application includes receiving a request to compile source code for the component-based application, wherein the request identifies the source code, and wherein the source code comprises a plurality of source code components, each of the source code components implementing a different component of the application, and performing a series of steps for each source code component where the series of steps includes: deriving a signature for the source code component, retrieving a stored signature corresponding to a currently available instance of executable code for the source code component, comparing the derived signature with the stored signature, compiling the source code component into the executable code when the derived signature does not match the stored signature, and obtaining the executable code for the source code component from a repository when the derived signature matches the stored signature. | 11-18-2010 |
20100293534 | USE OF VECTORIZATION INSTRUCTION SETS - In one embodiment, the invention is a method and apparatus for use of vectorization instruction sets. One embodiment of a method for generating vector instructions includes receiving source code written in a high-level programming language, wherein the source code includes at least one high-level instruction that performs multiple operations on a plurality of vector operands, and compiling the high-level instruction(s) into one or more low-level instructions, wherein the low-level instructions are in an instruction set of a specific computer architecture. | 11-18-2010 |
20100293535 | Profile-Driven Data Stream Processing - Techniques for compiling a data stream processing application are provided. The techniques include receiving, by a compiler executing on a computer system, source code for a data stream processing application, wherein the source code comprises source code for a plurality of operators, each of which performs a data processing function, determining, by the compiler, one or more characteristics of operators within the data stream processing application, grouping, by the compiler, the operators into one or more execution containers based on the one or more characteristics, and compiling, by the compiler, the source code for the data stream processing application into executable code, wherein the executable code comprises a plurality of execution units, wherein each execution unit contains one or more of the operators, wherein each operator is assigned to an execution unit based on the grouping, and wherein each execution unit is to be executed in a partition. | 11-18-2010 |
20100325621 | PARTITIONING OPERATOR FLOW GRAPHS - Techniques for partitioning an operator flow graph are provided. The techniques include receiving source code for a steam processing application, wherein the source code comprises an operator flow graph, wherein the operator flow graph comprises a plurality of operators, receiving profiling data associated with the plurality of operators and one or more processing requirements of the operators, defining a candidate partition as a coalescing of one or more of the operators into one or more sets of processing elements (PEs), using the profiling data to create one or more candidate partitions of the processing elements, using the one or more candidate partitions to choose a desired partitioning of the operator flow graph, and compiling the source code into an executable code based on the desired partitioning. | 12-23-2010 |
20110061060 | Determining Operator Partitioning Constraint Feasibility - Techniques for determining feasibility of a set of one or more operator partitioning constraints are provided. The techniques include receiving one or more sets of operator partitioning constraints, wherein each set of one or more constraints define one or more desired conditions for grouping together of operators into partitions and placing partitions on hosts, wherein each operator is embodied as software that performs a particular function, processing each set of one or more operator partitioning constraints to determine feasibility of each set of one or more operator partitioning constraints, creating and outputting one or more candidate partitions and one or more host placements for each set of feasible partitioning constraints, and creating and outputting a certificate of infeasibility for each set of infeasible partitioning constraints, wherein the certificate of infeasibility outlines one or more reasons for infeasibility. | 03-10-2011 |
20110191759 | Interactive Capacity Planning - Techniques for performing capacity planning for applications running on a computational infrastructure are provided. The techniques include instrumenting an application under development to receive one or more performance metrics under a physical deployment plan, receiving the one or more performance metrics from the computational infrastructure hosting one or more applications that are currently running, using a predictive inference engine to determine how the application under development can be deployed, and using the determination to perform capacity planning for the applications on the computational infrastructure. | 08-04-2011 |
20110219362 | Virtual Execution Environment for Streaming Languages - A virtual execution environment (VEE) for a streaming Intermediate Language (IL), wherein the streaming IL represents a streaming program, communicates streaming data in queues, stores data-at-rest in variables, and determines data by functions, where inputs are read from the queues and the variables, and outputs are written to the queues and the variables. | 09-08-2011 |
20110225584 | MANAGING MODEL BUILDING COMPONENTS OF DATA ANALYSIS APPLICATIONS - Data analysis applications include model building components and stream processing components. To increase utility of the data analysis application, in one embodiment, the model building component of the data analysis application is managed. Management includes resource allocation and/or configuration adaptation of the model building component, as examples. | 09-15-2011 |
20110239048 | PARTIAL FAULT TOLERANT STREAM PROCESSING APPLICATIONS - In one embodiment, the invention comprises partial fault tolerant stream processing applications. One embodiment of a method for implementing partial fault tolerance in a stream processing application comprising a plurality of stream operators includes: defining a quality score function that expresses how well the application is performing quantitatively, injecting a fault into at least one of the plurality of operators, assessing an impact of the fault on the quality score function, and selecting at least one partial fault-tolerant technique for implementation in the application based on the quantitative metric-driven assessment. | 09-29-2011 |
20110246972 | METHOD OF SELECTING AN EXPRESSION EVALUATION TECHNIQUE FOR DOMAIN-SPECIFIC LANGUAGE COMPILATION - A method and computer program product for selecting an expression evaluation technique for domain-specific language (DSL) compilation. An application written in DSL for a programming task is provided, the application including a plurality of components configured by expressions. A technique that most quickly implements the programming task is selected from a plurality of techniques for evaluating the expressions. The DSL application is compiled in accordance with the selected expression evaluation technique to generate general-purpose programming language (GPL) code. | 10-06-2011 |
20110247007 | OPERATORS WITH REQUEST-RESPONSE INTERFACES FOR DATA STREAM PROCESSING APPLICATIONS - Processing streaming data in a data processing system is facilitated by: declaring and defining, by a processor, a request-response interface as part of a stream processing operator defined using a stream processing language; processing a stream of data using the stream processing operator with the request-response interface defined as a part thereof; and communicating with the stream processing operator through the request-response interface via a communication path separate from the stream of data, the communicating accessing or controlling a state of the stream processing operator while the stream processing operator is processing the stream of data. | 10-06-2011 |
20110258246 | DISTRIBUTED SOLUTIONS FOR LARGE-SCALE RESOURCE ASSIGNMENT TASKS - Distributed data processing of problems representing resource assignment tasks. The problems are modeled as programs, and the programs are partitioned into sub-instances. Those sub-instances are executed in a distributed computing environment. The partitioning reduces communication costs between sub-instances and convergence time for the optimization program. | 10-20-2011 |
20110295939 | STATE SHARING IN A DISTRIBUTED DATA STREAM PROCESSING SYSTEM - State sharing is facilitated in stream processing environments, including distributed stream processing environments. A customized shared state implementation representing the state to be shared is automatically created based on at least one of user preferences, hints of usage, and system performance. | 12-01-2011 |
20120059839 | PROXYING OPEN DATABASE CONNECTIVITY (ODBC) CALLS - An Open Database Connectivity (ODBC) proxy infrastructure to transparently route incoming queries to one or more selected query engines. The ODBC proxy receives a query from an application, and determines based on the characteristics of the query and the capabilities of the query engines which one or more query engines are to perform the query. The proxy then routes the query to the one or more query engines, which perform the query. The results are then returned to the proxy, which provides the results to the application. | 03-08-2012 |
20120124233 | METHOD AND APPARATUS FOR ADAPTIVE LOAD SHEDDING - One embodiment of the present method and apparatus adaptive load shedding includes receiving at least one data stream (comprising a plurality of tuples, or data items) into a first sliding window of memory. A subset of tuples from the received data stream is then selected for processing in accordance with at least one data stream operation, such as a data stream join operation. Tuples that are not selected for processing are ignored. The number of tuples selected and the specific tuples selected depend at least in part on a variety of dynamic parameters, including the rate at which the data stream (and any other processed data streams) is received, time delays associated with the received data stream, a direction of a join operation performed on the data stream and the values of the individual tuples with respect to an expected output. | 05-17-2012 |
20120174110 | AMORTIZING COSTS OF SHARED SCANS - Techniques for scheduling a plurality of jobs sharing input are provided. The techniques include partitioning one or more input datasets into multiple subcomponents, analyzing a plurality of jobs to determine which of the plurality of jobs require scanning of one or more common subcomponents of the one or more input datasets, and scheduling a plurality of jobs that require scanning of one or more common subcomponents of the one or more input datasets, facilitating a single scanning of the one or more common subcomponents to be used as input by each of the plurality of jobs. | 07-05-2012 |
20120259910 | Generating a Distributed Stream Processing Application - Techniques for generating a distributed stream processing application are provided. The techniques include obtaining a declarative description of one or more data stream processing tasks from a graph of operators, wherein the declarative description expresses at least one stream processing task, generating one or more containers that encompass a combination of one or more stream processing operators, and generating one or more execution units from the declarative description of one or more data stream processing tasks, wherein the one or more execution units are deployable across one or more distributed computing nodes, and comprise a distributed data stream processing application binary. | 10-11-2012 |
20120276923 | SELECTIVE PROCESSING OF LOCATION-SENSITIVE DATA STREAMS - A method for processing a first data stream specifying locations of a user at different times and a second data stream specifying values of a monitored attribute at a location of interest at different times includes: receiving a location-centric trigger specifying at least one spatial predicate condition relative to the location of interest and at least one non-spatial predicate condition relevant to the location of interest, calculating a safe region that includes locations whose probability of satisfying the spatial predicate condition falls below a first threshold, calculating a safe value container that includes values whose probability of satisfying the non-spatial predicate condition falls below a second threshold, and processing the first data stream and the second data stream against the location-centric trigger, by considering only those locations that are not contained within the safe region and only those values that are not contained within the safe value container. | 11-01-2012 |
20120276930 | SELECTIVE PROCESSING OF LOCATION-SENSITIVE DATA STREAMS - A method for processing a first data stream specifying locations of a user at different times and a second data stream specifying values of a monitored attribute at a location of interest at different times includes: receiving a location-centric trigger specifying at least one spatial predicate condition relative to the location of interest and at least one non-spatial predicate condition relevant to the location of interest, calculating a safe region that includes locations whose probability of satisfying the spatial predicate condition falls below a first threshold, calculating a safe value container that includes values whose probability of satisfying the non-spatial predicate condition falls below a second threshold, and processing the first data stream and the second data stream against the location-centric trigger, by considering only those locations that are not contained within the safe region and only those values that are not contained within the safe value container. | 11-01-2012 |
20120297391 | APPLICATION RESOURCE MODEL COMPOSITION FROM CONSTITUENT COMPONENTS - Techniques for composing an application resource model are disclosed. The techniques include obtaining operator-level metrics from an execution of a data stream processing application according to a first configuration, wherein the application is executed by nodes of the data stream processing system and the application includes processing elements comprised of multiple operators, wherein two or more of the operators are combined in a first combination to form a processing element according to the first configuration, generating operator-level resource functions from the first combination of operators based on the obtained operator-level metrics, and generating a processing element-level resource function using the generated operator-level resource functions to predict a model for the processing element formed by a second combination of operators, the processing element-level resource function representing an application resource model usable for predicting characteristics of the application executed according to a second configuration. | 11-22-2012 |
20130018943 | DATA SHARING IN A DISTRIBUTED DATA STREAM PROCESSING SYSTEM - State sharing is facilitated in stream processing environments, including distributed stream processing environments. A customized shared state implementation representing the state to be shared is automatically created based on at least one of user preferences, hints of usage, and system performance. | 01-17-2013 |
20130151536 | Vertex-Proximity Query Processing - A method, an apparatus and an article of manufacture for processing a random-walk based vertex-proximity query on a graph. The method includes computing at least one vertex cluster and corresponding meta-information from a graph, dynamically updating the clustering and corresponding meta-information upon modification of the graph, and identifying a vertex cluster relevant to at least one query vertex and aggregating corresponding meta-information of the cluster to process the query. | 06-13-2013 |
20130231140 | SELECTIVE PROCESSING OF LOCATION-SENSITIVE DATA STREAMS - A method for processing a first data stream specifying locations of a user at different times and a second data stream specifying values of a monitored attribute at a location of interest at different times includes: receiving a location-centric trigger specifying at least one spatial predicate condition relative to the location of interest and at least one non-spatial predicate condition relevant to the location of interest, calculating a safe region that includes locations whose probability of satisfying the spatial predicate condition falls below a first threshold, calculating a safe value container that includes values whose probability of satisfying the non-spatial predicate condition falls below a second threshold, and processing the first data stream and the second data stream against the location-centric trigger, by considering only those locations that are not contained within the safe region and only those values that are not contained within the safe value container. | 09-05-2013 |
20130238936 | PARTIAL FAULT TOLERANT STREAM PROCESSING APPLICATIONS - In one embodiment, the invention comprises partial fault tolerant stream processing applications. One embodiment of a method for implementing partial fault tolerance in a stream processing application comprising a plurality of stream operators includes: defining a quality score function that expresses how well the application is performing quantitatively, injecting a fault into at least one of the plurality of operators, assessing an impact of the fault on the quality score function, and selecting at least one partial fault-tolerant technique for implementation in the application based on the quantitative metric-driven assessment. | 09-12-2013 |
20130239100 | Partitioning Operator Flow Graphs - Techniques for partitioning an operator flow graph are provided. The techniques include receiving source code for a stream processing application, wherein the source code comprises an operator flow graph, wherein the operator flow graph comprises a plurality of operators, receiving profiling data associated with the plurality of operators and one or more processing requirements of the operators, defining a candidate partition as a coalescing of one or more of the operators into one or more sets of processing elements (PEs), using the profiling data to create one or more candidate partitions of the processing elements, using the one or more candidate partitions to choose a desired partitioning of the operator flow graph, and compiling the source code into an executable code based on the desired partitioning. | 09-12-2013 |
20130254350 | METHOD AND APPARATUS FOR ADAPTIVE IN-OPERATOR LOAD SHEDDING - One embodiment of the present method and apparatus adaptive in-operator load shedding includes receiving at least two data streams (each comprising a plurality of tuples, or data items) into respective sliding windows of memory. A throttling fraction is then calculated based on input rates associated with the data streams and on currently available processing resources. Tuples are then selected for processing from the data streams in accordance with the throttling fraction, where the selected tuples represent a subset of all tuples contained within the sliding window. | 09-26-2013 |
20130304722 | RANGE QUERY METHODS AND APPARATUS - Range query techniques are disclosed for use in accordance with data stream processing systems. A technique is provided for incrementally processing continual range queries against moving objects. This may be done for location-aware services and applications. For example, a technique for evaluating one or more continual range queries over one or more moving objects comprises maintaining a query index with one or more containment-encoded virtual constructs associated with the one or more continual range queries over the one or more moving objects, and incrementally evaluating the one or more continual range queries using the query index. | 11-14-2013 |
20130339355 | CLUSTERING STREAMING GRAPHS - A system for clustering vertices in a streaming graph includes a structural sampler configured to receive a stream of edges. The structural sampler includes a reservoir manager configured to receive the stream of edges and create a structural reservoir and a support reservoir and a graph manager configured to receive the structural reservoir from the reservoir manager and to create a sampled graph from the structural reservoir, wherein the sampled graph includes one or more clusters that each include one or more connected vertices. | 12-19-2013 |
20130339357 | CLUSTERING STREAMING GRAPHS - Embodiments of the invention include methods for identifying one or more clusters in a streaming graph, the method includes receiving a stream of edges and sampling the stream of edges to create a structural reservoir and support reservoir. The method also includes creating a sampled graph from the structural reservoir and identifying the one or more clusters in the sampled graph by grouping one or more connected vertices in the sampled graph. | 12-19-2013 |
20140059210 | STREAM PROCESSING WITH RUNTIME ADAPTATION - Embodiments of the disclosure include a system for providing stream processing with runtime adaptation, having a stream processing application that receives an incoming data stream and a runtime infrastructure configured to execute the stream processing application. The system also includes an orchestrator configured to communicate with the runtime infrastructure and the stream processing application, the orchestrator configured to perform a method. The method includes registering one or more events, wherein each of the events is associated with a stream processing application. The method also includes monitoring, by a processor, for an occurrence of the one or more events associated with the stream processing application, wherein each of the one or more events is associated with one or more runtime metrics. The method further includes receiving an event notification, wherein the event notification includes event identification and an event context and executing an adaptation of the stream processing application. | 02-27-2014 |
20140059212 | STREAM PROCESSING WITH RUNTIME ADAPTATION - Embodiments of the disclosure include a method for providing stream processing with runtime adaptation includes registering one or more events, wherein each of the events is associated with a stream processing application. The method also includes monitoring, by a processor, for an occurrence of the one or more events associated with the stream processing application, wherein each of the one or more events is associated with one or more runtime metrics. The method further includes receiving an event notification, wherein the event notification includes event identification and an event context and executing an adaptation of the stream processing application. | 02-27-2014 |
20140068577 | Automatic Exploitation of Data Parallelism in Streaming Applications - An embodiment of the invention provides a method for exploiting stateless and stateful data parallelism in a streaming application, wherein a compiler determines whether an operator of the streaming application is safe to parallelize based on a definition of the operator and an instance of the definition. The operator is not safe to parallelize when the operator has selectivity greater than 1, wherein the selectivity is the number of output tuples generated for each input tuple. Parallel regions are formed within the streaming application with the compiler when the operator is safe to parallelize. Synchronization strategies for the parallel regions are determined with the compiler, wherein the synchronization strategies are determined based on the definition of the operator and the instance of the definition. The synchronization strategies of the parallel regions are enforced with a runtime system. | 03-06-2014 |
20140068578 | Automatic Exploitation of Data Parallelism in Streaming Applications - An embodiment of the invention provides a method for exploiting stateless and stateful data parallelism in a streaming application, wherein a compiler determines whether an operator of the streaming application is safe to parallelize based on a definition of the operator and an instance of the definition. The operator is not safe to parallelize when the operator has selectivity greater than 1, wherein the selectivity is the number of output tuples generated for each input tuple. Parallel regions are formed within the streaming application with the compiler when the operator is safe to parallelize. Synchronization strategies for the parallel regions are determined with the compiler, wherein the synchronization strategies are determined based on the definition of the operator and the instance of the definition. The synchronization strategies of the parallel regions are enforced with a runtime system. | 03-06-2014 |
20140101668 | Adaptive Auto-Pipelining for Stream Processing Applications - An embodiment of the invention provides a method for adaptive auto-pipelining of a stream processing application, wherein the stream processing application includes one or more threads. Runtime of the stream processing application is initiated with a stream processing application manager. The stream processing application is monitored with a monitoring module during the runtime, wherein the monitoring of the stream processing application includes identifying threads in the stream processing application that execute operators in a data flow graph, and determining an amount of work that each of the threads are performing on operators of the logical data flow graph. A processor identifies one or more operators in the data flow graph to add one or more additional threads based on the monitoring of the stream processing application during the runtime. | 04-10-2014 |
20140215484 | MANAGING MODEL BUILDING COMPONENTS OF DATA ANALYSIS APPLICATIONS - Data analysis applications include model building components and stream processing components. To increase utility of the data analysis application, in one embodiment, the model building component of the data analysis application is managed. Management includes resource allocation and/or configuration adaptation of the model building component, as examples. | 07-31-2014 |
20140355438 | ELASTIC AUTO-PARALLELIZATION FOR STREAM PROCESSING APPLICATIONS BASED ON A MEASURED THROUGHTPUT AND CONGESTION - A method for adjusting a data parallel region of a stream processing application includes measuring congestion of each parallel channel of the data parallel region, measuring a total throughput of all the parallel channels, and adjusting the number of parallel channels based on the current measured congestion and throughput. | 12-04-2014 |
20140359271 | ELASTIC AUTO-PARALLELIZATION FOR STREAM PROCESSING APPLICATIONS - A method for adjusting a data parallel region of a stream processing application includes measuring congestion of each parallel channel of the data parallel region, measuring a total throughput of all the parallel channels, and adjusting the number of parallel channels based on the current measured congestion and throughput. | 12-04-2014 |