CPOs Built Into mlrCPO

Martin Binder

2019-03-14

CPO Vignette Navigation

  1. First Steps (compact version)
  2. mlrCPO Core (compact version)
  3. CPOs Built Into mlrCPO (compact version)
  4. Building Custom CPOs (compact version)

Listing CPOs

Builtin CPOs can be listed with listCPO().

listCPO()[, c("name", "category", "subcategory")]
name category subcategory
11 cpoDropConstants data cleanup
36 cpoFixFactors data cleanup
10 cpoCollapseFact data factor data preprocessing
4 cpoAsNumeric data feature conversion
15 cpoDummyEncode data feature conversion
13 cpoImpactEncodeClassif data feature conversion
14 cpoImpactEncodeRegr data feature conversion
12 cpoProbEncode data feature conversion
55 cpoQuantileBinNumerics data feature conversion
61 cpoSelect data feature selection
62 cpoSelectFreeProperties data feature selection
51 cpoAddCols data features
50 cpoMakeCols data features
1 cpoApplyFun data general data preprocessing
53 cpoModelMatrix data general
37 cpoIca data numeric data preprocessing
54 cpoPca data numeric data preprocessing
58 cpoScale data numeric data preprocessing
59 cpoScaleMaxAbs data numeric data preprocessing
60 cpoScaleRange data numeric data preprocessing
64 cpoSpatialSign data numeric data preprocessing
16 cpoFilterFeatures featurefilter general
32 cpoFilterAnova featurefilter specialised
18 cpoFilterCarscore featurefilter specialised
28 cpoFilterChiSquared featurefilter specialised
26 cpoFilterGainRatio featurefilter specialised
25 cpoFilterInformationGain featurefilter specialised
33 cpoFilterKruskal featurefilter specialised
23 cpoFilterLinearCorrelation featurefilter specialised
17 cpoFilterMrmr featurefilter specialised
30 cpoFilterOneR featurefilter specialised
35 cpoFilterPermutationImportance featurefilter specialised
24 cpoFilterRankCorrelation featurefilter specialised
29 cpoFilterRelief featurefilter specialised
21 cpoFilterRfCImportance featurefilter specialised
22 cpoFilterRfImportance featurefilter specialised
19 cpoFilterRfSRCImportance featurefilter specialised
20 cpoFilterRfSRCMinDepth featurefilter specialised
27 cpoFilterSymmetricalUncertainty featurefilter specialised
31 cpoFilterUnivariate featurefilter specialised
34 cpoFilterVariance featurefilter specialised
38 cpoImpute imputation general
39 cpoImputeAll imputation general
40 cpoImputeConstant imputation specialised
48 cpoImputeHist imputation specialised
49 cpoImputeLearner imputation specialised
45 cpoImputeMax imputation specialised
42 cpoImputeMean imputation specialised
41 cpoImputeMedian imputation specialised
44 cpoImputeMin imputation specialised
43 cpoImputeMode imputation specialised
47 cpoImputeNormal imputation specialised
46 cpoImputeUniform imputation specialised
8 cpoCache meta
6 cpoCase meta
9 cpoCbind meta
5 cpoMultiplex meta
7 cpoTransformParams meta
68 cpoWrap meta wrap
69 cpoWrapRetrafoless meta wrap
65 cpoOversample subsampling binary classif
63 cpoSmote subsampling binary classif
66 cpoUndersample subsampling binary classif
67 cpoSample subsampling general
2 cpoApplyFunRegrTarget target general target transformation
56 cpoRegrResiduals target residual fitting
3 cpoLogTrafoRegr target target transformation
52 cpoMissingIndicators tools imputation
57 cpoResponseFromSE tools predict.type

NULLCPO

NULLCPO is the neutral element of %>>%. It is returned by some functions when no other CPO or Retrafo is present.

NULLCPO
#> NULLCPO
is.nullcpo(NULLCPO)
#> [1] TRUE
NULLCPO %>>% cpoScale()
#> scale(center = TRUE, scale = TRUE)
NULLCPO %>>% NULLCPO
#> NULLCPO
print(as.list(NULLCPO))
#> list()
pipeCPO(list())
#> NULLCPO

Meta-CPO

cpoWrap

A simple CPO with one parameter which gets applied to the data as CPO. This is different from a multiplexer in that its parameter is free and can take any value that behaves like a CPO. On the downside, this does not expose the argument’s parameters to the outside.

cpoMultiplex

Combine many CPOs into one, with an extra selected.cpo parameter that chooses between them.

cpm = cpoMultiplex(list(cpoScale, cpoPca))
print(cpm, verbose = TRUE)
#> Trafo chain of 1 cpos:
#> multiplex(selected.cpo = scale, scale.center = TRUE, scale.scale = TRUE, pca.center = TRUE, pca.scale = FALSE)
#> Operating: feature
#> ParamSet:
#>                  Type len   Def    Constr Req Tunable Trafo
#> selected.cpo discrete   - scale scale,pca   -    TRUE     -
#> scale.center  logical   -  TRUE         -   Y    TRUE     -
#> scale.scale   logical   -  TRUE         -   Y    TRUE     -
#> pca.center    logical   -  TRUE         -   Y    TRUE     -
#> pca.scale     logical   - FALSE         -   Y    TRUE     -
head(iris %>>% setHyperPars(cpm, selected.cpo = "scale"))
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1   -0.8976739  1.01560199    -1.335752   -1.311052  setosa
#> 2   -1.1392005 -0.13153881    -1.335752   -1.311052  setosa
#> 3   -1.3807271  0.32731751    -1.392399   -1.311052  setosa
#> 4   -1.5014904  0.09788935    -1.279104   -1.311052  setosa
#> 5   -1.0184372  1.24503015    -1.335752   -1.311052  setosa
#> 6   -0.5353840  1.93331463    -1.165809   -1.048667  setosa
# every CPO's Hyperparameters are exported
head(iris %>>% setHyperPars(cpm, selected.cpo = "scale", scale.center = FALSE))
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1    0.8613268   1.1296201    0.3362663    0.140405  setosa
#> 2    0.8275493   0.9682458    0.3362663    0.140405  setosa
#> 3    0.7937718   1.0327956    0.3122473    0.140405  setosa
#> 4    0.7768830   1.0005207    0.3602853    0.140405  setosa
#> 5    0.8444380   1.1618950    0.3362663    0.140405  setosa
#> 6    0.9119931   1.2587196    0.4083234    0.280810  setosa
head(iris %>>% setHyperPars(cpm, selected.cpo = "pca"))
#>   Species       PC1        PC2         PC3          PC4
#> 1  setosa -2.684126 -0.3193972  0.02791483  0.002262437
#> 2  setosa -2.714142  0.1770012  0.21046427  0.099026550
#> 3  setosa -2.888991  0.1449494 -0.01790026  0.019968390
#> 4  setosa -2.745343  0.3182990 -0.03155937 -0.075575817
#> 5  setosa -2.728717 -0.3267545 -0.09007924 -0.061258593
#> 6  setosa -2.280860 -0.7413304 -0.16867766 -0.024200858

cpoCase

A CPO that builds data-dependent CPO networks. This is a generalized CPO-Multiplexer that takes a function which decides (from the data, and from user-specified hyperparameters) what CPO operation to perform. Besides optional arguments, the used CPO’s Hyperparameters are exported as well. This is a generalization of cpoMultiplex; however, requires of the involved parameters are not adjusted, since this is impossible in principle.

The resulting CPO s.and.p performs scaling and PCA, with the order depending on the parameter logical.param and on whether the mean of the data’s first column exceeds 10. If either of those is true, the data will be first scaled, then PCA’d, otherwise the order is reversed. The all CPOs listed in .export are passed to the cpo.build.

cpoCbind

cbind other CPOs as operation. The cbinder makes it possible to build DAGs of CPOs that perform different operations on data and paste the results next to each other.

# cpoCbind recognises that "scale.scale" happens before "pca.pca" but is also fed to the
# result directly. The summary draws a (crude) ascii-art graph.
print(cbinder, verbose = TRUE)
#> Trafo chain of 1 cpos:
#> cbind(scale.center = TRUE, scale.scale = TRUE, pca.center = TRUE, pca.scale = FALSE)
#> Operating: feature
#> ParamSet:
#>                 Type len   Def Constr Req Tunable Trafo
#> scale.center logical   -  TRUE      -   -    TRUE     -
#> scale.scale  logical   -  TRUE      -   -    TRUE     -
#> pca.center   logical   -  TRUE      -   -    TRUE     -
#> pca.scale    logical   - FALSE      -   -    TRUE     -
#> O>+   scale(center = TRUE, scale = TRUE)
#> | |  
#> +<O   pca(center = TRUE, scale = FALSE)[not exp'd: tol = <NULL>, rank = <NULL>]
#> |  
#> O   CBIND[scaled,pcad,original]
#> 
head(iris %>>% cbinder)
#>   scaled.Sepal.Length scaled.Sepal.Width scaled.Petal.Length scaled.Petal.Width
#> 1          -0.8976739         1.01560199           -1.335752          -1.311052
#> 2          -1.1392005        -0.13153881           -1.335752          -1.311052
#> 3          -1.3807271         0.32731751           -1.392399          -1.311052
#> 4          -1.5014904         0.09788935           -1.279104          -1.311052
#> 5          -1.0184372         1.24503015           -1.335752          -1.311052
#> 6          -0.5353840         1.93331463           -1.165809          -1.048667
#>   scaled.Species pcad.Species  pcad.PC1   pcad.PC2    pcad.PC3     pcad.PC4
#> 1         setosa       setosa -2.257141 -0.4784238  0.12727962  0.024087508
#> 2         setosa       setosa -2.074013  0.6718827  0.23382552  0.102662845
#> 3         setosa       setosa -2.356335  0.3407664 -0.04405390  0.028282305
#> 4         setosa       setosa -2.291707  0.5953999 -0.09098530 -0.065735340
#> 5         setosa       setosa -2.381863 -0.6446757 -0.01568565 -0.035802870
#> 6         setosa       setosa -2.068701 -1.4842053 -0.02687825  0.006586116
#>   original.Sepal.Length original.Sepal.Width original.Petal.Length
#> 1                   5.1                  3.5                   1.4
#> 2                   4.9                  3.0                   1.4
#> 3                   4.7                  3.2                   1.3
#> 4                   4.6                  3.1                   1.5
#> 5                   5.0                  3.6                   1.4
#> 6                   5.4                  3.9                   1.7
#>   original.Petal.Width original.Species
#> 1                  0.2           setosa
#> 2                  0.2           setosa
#> 3                  0.2           setosa
#> 4                  0.2           setosa
#> 5                  0.2           setosa
#> 6                  0.4           setosa

cpoTransformParams

cpoTransformParams wraps another CPO and sets some of its hyperparameters to the value of expressions depending on other hyperparameter values. This can be used to make a transformation of parameters similar to the trafo parameter of a Param in ParamHelpers, but it can also be used to set multiple parameters at the same time, depending on a single new parameter.

Data Manipulation

cpoScale

Implements the base::scale function.

cpoPca

Implements stats::prcomp. No scaling or centering is performed.

cpoDummyEncode

Dummy encoding of factorial variables. Optionally uses the first factor as reference variable.

cpoSelect

Select to use only certain columns of a dataset. Select by column index, name, or regex pattern.

cpoDropConstants

Drops constant features or numerics, with variable tolerance

cpoFixFactors

Drops unused factors and makes sure prediction data has the same factor levels as training data.

cpoMissingIndicators

Creates columns indicating missing data. Most useful in combination with cpoCbind.

cpoApplyFun

Apply an univariate function to data columns

cpoAsNumeric

Convert (non-numeric) features to numeric

cpoCollapseFact

Combine low prevalence factors. Set max.collapsed.class.prevalence how big the combined factor level may be.

cpoModelMatrix

Specify which columns get used, and how they are transformed, using a formula.

cpoScaleRange

scale values to a given range

cpoScaleMaxAbs

Multiply features to set the maximum absolute value.

cpoSpatialSign

Normalize values row-wise

Imputation

There are two general and many specialised imputation CPOs. The general imputation CPOs have parameters that let them use different imputation methods on different columns. They are a thin wrapper around mlr’s impute() and reimpute() functions. The specialised imputation CPOs each implement exactly one imputation method and are closer to the behaviour of typical CPOs.

General Imputation Wrappers

cpoImpute and cpoImputeAll both have parameters very much like impute(). The latter assumes that all columns of its input is somehow being imputed and can be preprended to a learner to give it the ability to work with missing data. It will, however, throw an error if data is missing after imputation.

Specialised Imputation Wrappers

There is one for each imputation method.

Feature Filtering

There is one general and many specialised feature filtering CPOs. The general filtering CPO, cpoFilterFeatures, is a thin wrapper around filterFeatures and takes the filtering method as its argument. The specialised CPOs each call a specific filtering method.

Most arguments of filterFeatures are reflected in the CPOs. The exceptions being: 1. for filterFeatures, the filter method arguments are given in a list filter.args, instead of in ... 2. The argument fval was dropped for the specialised filter CPOs. 3. The argument mandatory.feat was dropped. Use affect.* parameters to prevent features from being filtered.

head(getTaskData(iris.task %>>% cpoFilterFeatures(method = "variance", perc = 0.5)))
#>   Sepal.Length Petal.Length Species
#> 1          5.1          1.4  setosa
#> 2          4.9          1.4  setosa
#> 3          4.7          1.3  setosa
#> 4          4.6          1.5  setosa
#> 5          5.0          1.4  setosa
#> 6          5.4          1.7  setosa
head(getTaskData(iris.task %>>% cpoFilterVariance(perc = 0.5)))
#>   Sepal.Length Petal.Length Species
#> 1          5.1          1.4  setosa
#> 2          4.9          1.4  setosa
#> 3          4.7          1.3  setosa
#> 4          4.6          1.5  setosa
#> 5          5.0          1.4  setosa
#> 6          5.4          1.7  setosa
# The specialised filter CPOs are:
listCPO()[listCPO()$category == "featurefilter" & listCPO()$subcategory == "specialised",
          c("name", "description")]
#>                        name
#> 32           cpoFilterAnova
#> 18        cpoFilterCarscore
#> 28      cpoFilterChiSquared
#> 26       cpoFilterGainRatio
#> 25 cpoFilterInformationGain
#> 33         cpoFilterKruskal
#> ... (#rows: 19, #cols: 1)