In a previous post, we enumerated the four necessary constituents for deciding anything: (1) measurements, (2) potential actions, (3) a decision rule, and (4) a loss function. Many great things have come from this stylized setting, including essentially everything in the fields of “pattern recognition” and “data mining”. One can often obtain stronger theoretical results, and better empirical performance, by adopting additional assumptions/structure to the decision process. Here are three additional constituents that are essentially required when incorporating uncertainty.
Probabilistic Generative Model: This is a set of probability distributions under consideration. For example, “Gaussian” can be a probabilistic generative model (or “model”, for short), where all Gaussian distributions, each characterized by a particular mean and variance, are elements of the model. Note that if there are multiple measurements being used to make a decision, some assumptions are required to link those measurements. The simplest, and probably most commonly used assumption, is that each measurement is sampled independently and identically from a true but unknown distribution in the model. Formally, .
Estimator: To quote wikipedia, who was quoting Tukey, “an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguished.” In this context, estimators estimate a decision rule, which chooses an action. For example, in linear discriminant analysis, the estimator learns the decision boundary, and the estimate is the decision boundary.
Formally, an estimator is a sequence of functions, , that maps from a set of measurements , each in some space , and yields some action , which is one of the set of feasible actions, ; that is, . Often, the action space is in fact a set of admissable decision rules.
Risk Functional: Under uncertainty, a given decision rule will incur losses probabilistically depending on the particular realized measurements. Therefore, simply minimizing loss on the observed measurements may not be desirable, and in particular, may result in over-fitting. It is therefore more desirable to choose/learn a decision rule that minimizes some functional of the distribution of loss induced by the true but unknown distribution. For example, one may desire a decision rule that minimizes the expected loss. In other contexts, one may instead desire a decision rule that minimizes the expected loss subject to a constraint on the size of the expected variance. This comes up, for example, in financial portfolio optimization.
With these three additional constituents, one can begin constructing estimators that have desirable properties, as will be described in the next post