Patent application number | Description | Published |
20080234977 | Methods and Apparatus for Outlier Detection for High Dimensional Data Sets - Methods and apparatus are provided for outlier detection in databases by determining sparse low dimensional projections. These sparse projections are used for the purpose of determining which points are outliers. The methodologies of the invention are very relevant in providing a novel definition of exceptions or outliers for the high dimensional domain of data. | 09-25-2008 |
20080243742 | Method and Apparatus for Predicting Future Behavior of Data Streams - Techniques are disclosed for predicting the future behavior of data streams through the use of current trends of the data stream. By way of example, a technique for predicting the future behavior of a data stream comprises the following steps/operations. Statistics are obtained from the data stream. Estimated statistics for a future time interval are generated by using at least a portion of the obtained statistics. A portion of the estimated statistics are utilized to generate one or more representative pseudo-data records within the future time interval. Pseudo-data records are utilized for forecasting of at least one characteristic of the data stream. | 10-02-2008 |
20090222410 | Method and Apparatus for Query Processing of Uncertain Data - Techniques are disclosed for indexing uncertain data in query processing systems. For example, a method for processing queries in an application that involves an uncertain data set includes the following steps. A representation of records of the uncertain data set is created based on mean values and uncertainty values. The representation is utilized for processing a query received on the uncertain data set. | 09-03-2009 |
20090222472 | Method and Apparatus for Aggregation in Uncertain Data - Techniques are disclosed for aggregation in uncertain data in data processing systems. For example, a method of aggregation in an application that involves an uncertain data set includes the following steps. The uncertain data set along with uncertainty information is obtained. One or more clusters of data points are constructed from the data set. Aggregate statistics of the one or more clusters and uncertainty information are stored. The data set may be data from a data stream. It is realized that the use of even modest uncertainty information during an application such as a data mining process is sufficient to greatly improve the quality of the underlying results. | 09-03-2009 |
20090281971 | SYSTEM AND METHOD FOR CLASSIFYING DATA STREAMS WITH VERY LARGE CARDINALITY - Systems and methods for object classification are provided. An object is identified along with the attributes that describe that object. These attributes are grouped into attribute patterns. Classes to be used in the classification are also identified. For each identified class a sketch table containing a plurality of parallel hash tables is created and trained using known objects with known classifications. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table. This results in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern. This produces a discriminatory power for each attribute pattern. Those attribute patterns having a discriminatory power above a given threshold are selected. The selected attribute patterns and associated sketch table values are added. The sketch table with the largest overall sum is identified, and the class associated with that sketch table is assigned to the object to which the attribute patterns belong. | 11-12-2009 |
20090292979 | Methods and Apparatus for Monitoring Abnormalities in Data Stream - A technique for monitoring a primary data stream comprising a plurality of secondary data streams for abnormalities is provided. A deviation value for each of two or more of the plurality of secondary data streams is determined. The two or more deviation values of the two or more secondary data streams are combined to form a combined deviation value. An abnormality signal is generated based at least in part on the combined deviation value. | 11-26-2009 |
20090319526 | Method and Apparatus for Variable Privacy Preservation in Data Mining - Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application. Principles of the invention are capable of handling both static and dynamic data sets | 12-24-2009 |
20100268734 | SYSTEM AND METHOD FOR DISTRIBUTED PRIVACY PRESERVING DATA MINING - Distributed privacy preserving data mining techniques are provided. A first entity of a plurality of entities in a distributed computing environment exchanges summary information with a second entity of the plurality of entities via a privacy-preserving data sharing protocol such that the privacy of the summary information is preserved, the summary information associated with an entity relating to data stored at the entity. The first entity may then mine data based on at least the summary information obtained from the second entity via the privacy-preserving data sharing protocol. The first entity may obtain, from the second entity via the privacy-preserving data sharing protocol, information relating to the number of transactions in which a particular itemset occurs and/or information relating to the number of transactions in which a particular rule is satisfied. | 10-21-2010 |
20120166382 | System and Method for Classifying Data Streams with Very Large Cardinality - An object and attributes that describe that object are identified. The attributes are grouped into attribute patterns, and classification classes are identified. For each identified class a sketch table containing a plurality of parallel hash tables is created. For the object to be classified, each attribute pattern is processed using the all of the hash functions for each sketch table, resulting in a plurality of values under each sketch table for a single attribute pattern. The lowest value is selected for each sketch table. The distribution of values across all sketch tables is evaluated for each attribute pattern, producing a discriminatory power for each attribute pattern. Attribute patterns having a discriminatory power above a given threshold are selected and added to the associated sketch table values. The sketch table with the largest overall sum is identified, and the associated class is assigned to the object belonging to the attribute patterns. | 06-28-2012 |
20140041049 | METHOD AND APPARATUS FOR VARIABLE PRIVACY PRESERVATION IN DATA MINING - Improved privacy preservation techniques are disclosed for use in accordance with data mining. By way of example, a technique for preserving privacy of data records for use in a data mining application comprises the following steps/operations. Different privacy levels are assigned to the data records. Condensed groups are constructed from the data records based on the privacy levels, wherein summary statistics are maintained for each condensed group. Pseudo-data is generated from the summary statistics, wherein the pseudo-data is available for use in the data mining application. Principles of the invention are capable of handling both static and dynamic data sets | 02-06-2014 |