Patent application number | Description | Published |
20080215629 | TRACK SHUFFLING SYSTEM AND METHOD - A method and computer program product for defining an active playlist of media tracks for rendering in a defined sequence. A portion of the media tracks defined in the active playlist are rendered. A shuffle command is received from a user concerning the active playlist. A non-rendered playlist is generated. The non-rendered playlist defines one or more non-rendered media tracks included within the active playlist and excludes one or more rendered media track included within the active playlist. A rendering sequence is defined for the non-rendered playlist. | 09-04-2008 |
20150379072 | INPUT PROCESSING FOR MACHINE LEARNING - A record extraction request for a data set is received at a machine learning service. A plan to perform one or more chunk-level operations (such as sampling, shuffling, splitting or partitioning for parallel computation) on chunks of the data set is generated. A set of data transfers that results in a particular chunk being stored in a particular server's memory is initiated to implement the first chunk-level operation of the sequence. A second operation such as another filtering operation or a feature processing operation is performed on a result set of the first chunk-level operation. | 12-31-2015 |
20150379423 | FEATURE PROCESSING RECIPES FOR MACHINE LEARNING - A first representation of a feature processing recipe is received at a machine learning service. The recipe includes a section in which groups of variables on which common transformations are to be applied are defined, and a section in which a set of transformation operations are specified. The first representation of the recipe is validated based at least in part on a library of function definitions supported by the service, and an executable version of the recipe is generated. In response to a determination that the recipe is to be executed on a particular data set, a set of provider network resources is used to implement a transformation operation indicated in the recipe. | 12-31-2015 |
20150379424 | MACHINE LEARNING SERVICE - A machine learning service implements programmatic interfaces for a variety of operations on several entity types, such as data sources, statistics, feature processing recipes, models, and aliases. A first request to perform an operation on an instance of a particular entity type is received, and a first job corresponding to the requested operation is inserted in a job queue. Prior to the completion of the first job, a second request to perform another operation is received, where the second operation depends on a result of the operation represented by the first job. A second job, indicating a dependency on the first job, is stored in the job queue. The second job is initiated when the first job completes. | 12-31-2015 |
20150379425 | CONSISTENT FILTERING OF MACHINE LEARNING DATA - Consistency metadata, including a parameter for a pseudo-random number source, are determined for training-and-evaluation iterations of a machine learning model. Using the metadata, a first training set comprising records of at least a first chunk is identified from a plurality of chunks of a data set. The first training set is used to train a machine learning model during a first training-and-evaluation iteration. A first test set comprising records of at least a second chunk is identified using the metadata, and is used to evaluate the model during the first training-and-evaluation iteration. | 12-31-2015 |
20150379426 | OPTIMIZED DECISION TREE BASED MODELS - During a training phase of a machine learning model, representations of at least some nodes of a decision tree are generated and stored on persistent storage in depth-first order. A respective predictive utility metric (PUM) value is determined for one or more nodes, indicating expected contributions of the nodes to a prediction of the model. A particular node is selected for removal from the tree based at least partly on its PUM value. A modified version of the tree, with the particular node removed, is stored for obtaining a prediction. | 12-31-2015 |
20150379427 | FEATURE PROCESSING TRADEOFF MANAGEMENT - At a machine learning service, a set of candidate variables that can be used to train a model is identified, including at least one processed variable produced by a feature processing transformation. A cost estimate indicative of an effect of implementing the feature processing transformation on a performance metric associated with a prediction goal of the model is determined. Based at least in part on the cost estimate, a feature processing proposal that excludes the feature processing transformation is implemented. | 12-31-2015 |
20150379428 | CONCURRENT BINNING OF MACHINE LEARNING DATA - Variables of observation records to be used to generate a machine learning model are identified as candidates for quantile binning transformations. In accordance with a particular concurrent binning plan generated for a particular variable, a plurality of quantile binning transformations are applied to the particular variable, including a first transformation with a first bin count and a second transformation with a different bin count. The first and second transformations result in the inclusion of respective parameters or weights for binned features in a parameter vector of the model. In a post-training phase run of the model, at least one parameter corresponding to a binned feature is used to generate a prediction. | 12-31-2015 |
20150379429 | INTERACTIVE INTERFACES FOR MACHINE LEARNING MODEL EVALUATIONS - A first data set corresponding to an evaluation run of a model is generated at a machine learning service for display via an interactive interface. The data set includes a prediction quality metric. A target value of an interpretation threshold associated with the model is determined based on a detection of a particular client's interaction with the interface. An indication of a change to the prediction quality metric that results from the selection of the target value may be initiated. | 12-31-2015 |
20150379430 | EFFICIENT DUPLICATE DETECTION FOR MACHINE LEARNING DATA SETS - At a machine learning service, a determination is made that an analysis to detect whether at least a portion of contents of one or more observation records of a first data set are duplicated in a second set of observation records is to be performed. A duplication metric is obtained, indicative of a non-zero probability that one or more observation records of the second set are duplicates of respective observation records of the first set. In response to determining that the duplication metric meets a threshold criterion, one or more responsive actions are initiated, such as the transmission of a notification to a client of the service. | 12-31-2015 |