Patent application number | Description | Published |
20140059073 | Systems and Methods for Providing a Unified Variable Selection Approach Based on Variance Preservation - This disclosure describes a method, system and computer-program product for parallelized feature selection. The method, system and computer-program product may be used to access a first set of features, wherein the first set of features includes multiple features, wherein the features are characterized by a variance measure, and wherein accessing the first set of features includes using a computing system to access the features, determine components of a covariance matrix, the components of the covariance matrix indicating a covariance with respect to pairs of features in the first set, and select multiple features from the first set, wherein selecting is based on the determined components of the covariance matrix and an amount of the variance measure attributable to the selected multiple features, and wherein selecting the multiple features includes executing a greedy search performed using parallelized computation. | 02-27-2014 |
20140089247 | Fast Binary Rule Extraction for Large Scale Text Data - Systems and methods for identifying data files that have a common characteristic are provided. A plurality of data files including one or more data files having a common characteristic are received. A potential rule is generated by selecting key terms from a list that satisfy a term evaluation metric, and the potential rule is evaluated using a rule evaluation metric. The potential rule is added to the rule set if the rule evaluation metric is satisfied. Based upon the potential rule being added to the rule set, data files covered by the potential rule are removed from the plurality of data files. The potential rule generation and evaluation steps are repeated until a stopping criterion is met. After the stopping criterion has been met, the rule set is used to identify other data files having the common characteristic. | 03-27-2014 |
20140337271 | SYSTEM FOR EFFICIENTLY GENERATING K-MAXIMALLY PREDICTIVE ASSOCIATION RULES WITH A GIVEN CONSEQUENT - This disclosure provides a computer-program product, system, method and apparatus for accessing a representation of a category or item and accessing a set of multiple transactions. The transactions are processed to identify items found amongst the transactions, and the items are ordered based on an information-gain heuristic. A depth-first search for a group of best association rules is then conducted using a best-first heuristic and constraints that make the search efficient. The best rules found during the search can then be displayed to a user, along with accompanying statistics. The user can then select rules that appear to be most relevant, and further analytics can be applied to the selected rules to obtain further information about the information provided by these rules. | 11-13-2014 |
20140337272 | SYSTEMS AND METHODS FOR INTERACTIVE DISPLAYS BASED ON ASSOCIATIONS FOR MACHINE-GUIDED RULE CREATION - This disclosure provides a computer-program product, system, method and apparatus for accessing a representation of a category or item and accessing a set of multiple transactions. The transactions are processed to identify items found amongst the transactions, and the items are ordered based on an information-gain heuristic. A depth-first search for a group of best association rules is then conducted using a best-first heuristic and constraints that make the search efficient. The best rules found during the search can then be displayed to a user, along with accompanying statistics. The user can then select rules that appear to be most relevant, and further analytics can be applied to the selected rules to obtain further information about the information provided by these rules. | 11-13-2014 |
20150193523 | SYSTEM AND METHODS FOR INTERACTIVE DISPLAYS BASED ON ASSOCIATIONS FOR MACHINE-GUIDED RULE CREATION - This disclosure provides a computer-program product, system, method and apparatus for accessing a representation of a category or item and accessing a set of multiple transactions. The transactions are processed to identify items found amongst the transactions, and the items are ordered based on an information-gain heuristic. A depth-first search for a group of best association rules is then conducted using a best-first heuristic and constraints that make the search efficient. The best rules found during the search can then be displayed to a user, along with accompanying statistics. The user can then select rules that appear to be most relevant, and further analytics can be applied to the selected rules to obtain further information about the information provided by these rules. | 07-09-2015 |
20150242484 | Sparse Matrix Storage in a Database - Methods, processes and computer-program products are disclosed for use in a parallelized computing system in which representations of large sparse matrices are efficiently encoded and communicated between grid-computing devices. A sparse matrix can be encoded and stored as a collection of character strings wherein each character string is a Base64 encoded string representing the non-zero elements of a single row of the sparse matrix. On a per-row basis, non-zero elements can be identified by column indices and error correction metadata can be included. The resultant row data can be converted to IEEE 754 8-byte representations and then encoded into Base64 characters for storage as strings. These character strings of even very large-dimensional sparse matrices can be efficiently stored in databases or communicated to grid-computing devices. | 08-27-2015 |
20150242762 | GENERATING AND DISPLAYING CANONICAL RULE SETS WITH DIMENSIONAL TARGETS - Systems and methods for performing analyses on data sets to display canonical rules sets with dimensional targets are disclosed. A cross-corpus rule set for a given Topic can be generated based on the entire corpus of data. A first dimensional rule set can be generated based on a first context (e.g., based on the same Topic but using a first sub-domain of the corpus of data). A second dimensional rule set can be generated based on a second context (e.g., based on the same Topic but using a second sub-domain of the corpus of data). Key dimensional differentiators (e.g., for each dimension, or context, of the Topic) can be determined based on a comparison of the general rule set, the first dimensional rule set, and the second dimensional rule set. A canonical rule set visualization can be displayed. The visualization can highlight the dimensional selectors (e.g., those tokens, or nodes, that differ between the first dimensional rule set and the second dimensional rule set). | 08-27-2015 |
20150324324 | Linear Regression Using Safe Screening Techniques - Systems and methods for linear regression using safe screening techniques. A computing system may receive, from a user of the system, a data set including a set of variables, the set of variables being related to a linear model for predicting a response variable of the data set. The computing system may determine an active set of variables using a safe screening algorithm The computing system may generate the linear model using the active set and a least angle regression algorithm. The computing system may provide, to the user of the system, information related to the linear model. | 11-12-2015 |