Patent application number | Description | Published |
20090098515 | METHOD AND APPARATUS FOR IMPROVED REWARD-BASED LEARNING USING NONLINEAR DIMENSIONALITY REDUCTION - The present invention is a method and an apparatus for reward-based learning of policies for managing or controlling a system or plant. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance measure between pairs of exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation. The mapping is then applied to the set of exemplars, and reward-based learning is applied to the transformed exemplars to obtain a management policy. | 04-16-2009 |
20090099985 | METHOD AND APPARATUS FOR IMPROVED REWARD-BASED LEARNING USING ADAPTIVE DISTANCE METRICS - The present invention is a method and an apparatus for reward-based learning of policies for managing or controlling a system or plant. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance metric and a distance-based function approximator estimating long-range expected value are then initialized, where the distance metric computes a distance between two (state, action) pairs, and the distance metric and function approximator are adjusted such that a Bellman error measure of the function approximator on the set of exemplars is minimized. A management policy is then derived based on the trained distance metric and function approximator. | 04-16-2009 |
20130185039 | MONTE-CARLO PLANNING USING CONTEXTUAL INFORMATION - A method, system and computer program product for choosing actions in a state of a planning problem. The system simulates one or more sequences of actions, state transitions and rewards starting from the current state of the planning problem. During the simulation of performing a given action in a given state, a data record is maintained of observed contextual state information, and observed cumulative reward resulting from the action. The system performs a regression fit on the data records, enabling estimation of expected reward as a function of contextual state. The estimations of expected rewards are used to guide the choice of actions during the simulations. Upon completion of all simulations, the top-level action which obtained highest mean reward during the simulations is recommended to be executed in the current state of the planning problem. | 07-18-2013 |
20130204412 | OPTIMAL POLICY DETERMINATION USING REPEATED STACKELBERG GAMES WITH UNKNOWN PLAYER PREFERENCES - A system, method and computer program product for planning actions in a repeated Stackelberg Game, played for a fixed number of rounds, where the payoffs or preferences of the follower are initially unknown to the leader, and a prior probability distribution over follower types is available. In repeated Bayesian Stackelberg games, the objective is to maximize the leader's cumulative expected payoff over the rounds of the game. The optimal plans in such games make intelligent tradeoffs between actions that reveal information regarding the unknown follower preferences, and actions that aim for high immediate payoff. The method solves for such optimal plans according to a Monte Carlo Tree Search method wherein simulation trials draw instances of followers from said prior probability distribution. Some embodiments additionally implement a method for pruning dominated leader strategies. | 08-08-2013 |
Patent application number | Description | Published |
20080243735 | ACTIVE SAMPLING COLLABORATIVE PREDICTION METHOD FOR END-TO-END PERFORMANCE PREDICTION - Active sample collaborative prediction method, system and program storage device are provided. A method in one aspect may include determining approximation X for matrix Y using collaborative prediction, said matrix Y being sparse initially and representing pairwise measurement values; selecting one or more unobserved entries from said matrix Y representing active samples using said approximation X and an active sample heuristic; obtaining values associated with said unobserved entries; inserting said values to said matrix Y; and repeating the steps of determining, selecting, obtaining and inserting until a predetermined condition is satisfied. | 10-02-2008 |
20080263559 | METHOD AND APPARATUS FOR UTILITY-BASED DYNAMIC RESOURCE ALLOCATION IN A DISTRIBUTED COMPUTING SYSTEM - In one embodiment, the present invention is a method for allocation of finite computational resources amongst multiple entities, wherein the method is structured to optimize the business value of an enterprise providing computational services. One embodiment of the inventive method involves establishing, for each entity, a service level utility indicative of how much business value is obtained for a given level of computational system performance. The service-level utility for each entity is transformed into a corresponding resource-level utility indicative of how much business value may be obtained for a given set or amount of resources allocated to the entity. The resource-level utilities for each entity are aggregated, and new resource allocations are determined and executed based upon the resource-level utility information. The invention is thereby capable of making rapid allocation decisions, according to time-varying need or value of the resources by each of the entities. | 10-23-2008 |
20090012922 | METHOD AND APPARATUS FOR REWARD-BASED LEARNING OF IMPROVED SYSTEMS MANAGEMENT POLICIES - In one embodiment, the present invention is a method for reward-based learning of improved systems management policies. One embodiment of the inventive method involves supplying a first policy and a reward mechanism. The first policy maps states of at least one component of a data processing system to selected management actions, while the reward mechanism generates numerical measures of value responsive to particular actions (e.g., management actions) performed in particular states of the component(s). The first policy and the reward mechanism are applied to the component(s), and results achieved through this application (e.g., observations of corresponding states, actions and rewards) are processed in accordance with reward-based learning to derive a second policy having improved performance relative to the first policy in at least one state of the component(s). | 01-08-2009 |
20120203912 | Autonomic computing system with model transfer - Methods and systems are provided for autonomic control and optimization of computing systems. A plurality of component models for one or more components in an autonomic computing system are maintained in a system level database. These component models are obtained from a source external to the management server including the components associated with the models. Component models are added or removed from the database or updated as need. A system level management server in communication with the database utilizes the component models maintained in the system level database and generic component models as needed to compute an optimum state of the autonomic computing system. The autonomic computing system is managed in accordance with the computed optimum state. | 08-09-2012 |
Patent application number | Description | Published |
20090007277 | System and Method for Automatically Hiding Sensitive Information Obtainable from a Process Table - The present invention provides a system and method for automatically hiding sensitive information, obtainable from a process table, from other processes that should not access the sensitive information. The system and method include a sensitive command attribute table that is used by a system administrator to designate the commands and command attributes that will typically be associated with sensitive information. The sensitive command attribute table is used when a command is entered that requests information from the process table to be displayed or output. In response, a search of the process table entries is made to determine if a command and/or its attribute in the process table matches an entry in the sensitive command attribute table. If so, the command, its attributes, and/or its attribute values are blanked from the output of the process table information. | 01-01-2009 |
20090063801 | Write Protection Of Subroutine Return Addresses - Exemplary methods, systems, and products are described that operate generally by moving subroutine return address protection to the processor itself, in effect proving atomic locks for subroutine return addresses stored in a stack, subject to application control. More particularly, exemplary methods, systems, and products are described that write protect subroutine return addresses by calling a subroutine, including storing in a stack memory address a subroutine return address and locking, by a computer processor, the stack memory address against write access. Calling a subroutine may include receiving in the computer processor an instruction to lock the stack memory address. Locking the stack memory address may be carried out by storing the stack memory address in a protected memory lockword. A protected memory lockword may be implemented as a portion of a protected content addressable memory. | 03-05-2009 |
20090077468 | METHOD OF SWITCHING INTERNET PERSONAS BASED ON URL - A method of communicating with a remote site on a network by establishing different user personas respectively associated with different remote sites on the network, each user persona containing one or more attributes used in accessing the remote sites, and then accessing a specific one of the remote sites using the attributes in a specific one of the user personas that is associated with the specific remote site. The specific remote site can be associated with the specific user persona by a universal resource locator (URL), e.g., for web sites on the Internet, and the accessing is automatically performed in response to matching of the URL of the specific remote site to the URL associated with the specific user persona. A default persona can be used for any remote site having no specifically associated user persona. | 03-19-2009 |