Active learning The learning algorithm must have some control over the data from which it learns It must be able to query an oracle, requesting for labels of data samples that seem to be most informative for the learning process Proper selection of samples implies better performances with fewer data
Scenarios Learning with membership queries Stream-based sampling Pool-based sampling
Strategies Uncertainty sampling Query-by-committee Density-weighted…
Conformal prediction Permits complementation of predictions made by machine learning algorithms with some measures of reliability The label predicted for a new object must make it similar to the old objects The degree of similarity is used to estimate the confidence in the prediction
Conformal prediction algorithm Inputs: Training sample and a test sample Consider all possible values for the label ; Compute nonconformity scores and p-values for each possible classification; Predict the label corresponding to the largest p-value calculated; Output one minus the second largest p-value as the confidence for the prediction; Output the largest p-value calculated as the credibility of the prediction.
Nonconformity scores and p-values Used as nonconformity scores the Lagrange multipliers computed during SVM training Extended to a multiclass framework in a one-vs-rest approach P-values:
Active learning algorithm Inputs Initial training set T, calibration set C, pool of candidate samples U Selection treshold τ, batch size β Train an initial classifier on T While a stopping-criterion is not reached Apply the current classifier to the pool of samples Rank the samples in the pool using the uncertainty criterion Select the top β examples whose certainty level fall under the selection threshold τ Ask teacher to label the selected examples and add them to the training set Train a new classifier on the expanded training set
Stopping criteria Pre-specified size for the training set Exhaustion of the pool of candidate samples Early-stop Implemented using the calibration set Active selection stops if no improvements can be obtained when applying newly trained classifiers to the calibration set