A graphical summary of your random forest

Details of your forest

Distribution of minimal depth

Importance measures

Multi-way importance plot

Compare importance measures

Compare rankings of variables

Variable interactions

Conditional minimal depth
Prediction on a grid

Details of your forest

## 
## Call:
##  randomForest(formula = medv ~ ., data = Boston, ntree = 1000,      localImp = TRUE) 
##                Type of random forest: regression
##                      Number of trees: 1000
## No. of variables tried at each split: 4
## 
##           Mean of squared residuals: 9.718371
##                     % Var explained: 88.49

Distribution of minimal depth

The plot below shows the distribution of minimal depth among the trees of your forest. Note that:

the mean of the distribution is marked by a vertical bar with a value label on it (the scale for it is different than for the rest of the plot),
the scale of the X axis goes from zero to the maximum number of trees in which any variable was used for splitting.

Minimal depth for a variable in a tree equals to the depth of the node which splits on that variable and is the closest to the root of the tree. If it is low than a lot of observations are divided into groups on the basis of this variable

Importance measures

Below you can explore the measures of importance for all variables in the forest:

Show entries

Search:

	variable	mean_min_depth	no_of_nodes	mse_increase	node_purity_increase	no_of_trees	times_a_root	p_value
1	age	3.3140	17731	3.7805	1152.4979	1000	5	0.0000
2	black	3.4650	15721	1.6084	794.0147	1000	3	0.0000
3	chas	6.6306	1517	0.4911	206.4681	826	0	1.0000
4	crim	2.4080	18721	8.1863	2482.7265	1000	37	0.0000
5	dis	2.6040	18403	7.2667	2474.2986	1000	1	0.0000
6	indus	3.1550	8252	7.0550	2856.3551	1000	171	1.0000
7	lstat	1.2920	22801	62.3031	12406.8649	1000	269	0.0000
8	nox	2.5500	12500	10.2230	2727.6911	1000	77	0.9501
9	ptratio	2.8630	9071	6.3893	2273.5872	1000	93	1.0000
10	rad	5.0817	5227	1.3026	318.1657	997	5	1.0000

Showing 1 to 10 of 13 entries

Previous1 2Next

Multi-way importance plot

The multi-way importance plot shows the relation between three measures of importance and labels 10 variables which scored best when it comes to these three measures (i.e. for which the sum of the ranks for those measures is the lowest).

The first multi-way importance plot focuses on three importance measures that derive from the structure of trees in the forest:

mean depth of first split on the variable,
number of trees in which the root is split on the variable,
the total number of nodes in the forest that split on that variable.

The second multi-way importance plot shows two importance measures that derive from the role a variable plays in prediction: with the additional information on the $p$ -value based on a binomial distribution of the number of nodes split on the variable assuming that variables are randomly drawn to form splits (i.e. if a variable is significant it means that the variable is used for splitting more often than would be the case if the selection was random).

Compare importance measures

The plot below shows bilateral relations between the following importance measures: , if some variables are strongly related to each other it may be worth to consider focusing only on one of them.

Compare rankings of variables

The plot below shows bilateral relations between the rankings of variables according to chosen importance measures. This approach might be useful as rankings are more evenly spread than corresponding importance measures. This may also more clearly show where the different measures of importance disagree or agree.

Variable interactions

Conditional minimal depth

The plot below reports 30 top interactions according to mean of conditional minimal depth – a generalization of minimal depth that measures the depth of the second variable in a tree of which the first variable is a root (a subtree of a tree from the forest). In order to be comparable to normal minimal depth 1 is subtracted so that 0 is the minimum.

For example value of 0 for interaction x:y in a tree means that if we take the highest subtree with the root splitting on x then y is used for splitting immediately after x (minimal depth of x in this subtree is 1). The values presented are means over all trees in the forest.

Note that:

the plot shows only 30 interactions that appeared most frequently,
the horizontal line shows the minimal value of the depicted statistic among interactions for which it was calculated,
the interactions considered are ones with the following variables as first (root variables): and all possible values of the second variable.

You can explore the data used for plotting by interacting with the following table:

Show entries

Search:

	variable	root_variable	mean_min_depth	occurrences	interaction	uncond_mean_min_depth
1	age	age	4.1430	608	age:age	3.3140
2	age	black	3.6716	529	black:age	3.3140
3	age	chas	3.9558	368	chas:age	3.3140
4	age	crim	3.5194	778	crim:age	3.3140
5	age	dis	3.8102	766	dis:age	3.3140
6	age	indus	4.0846	702	indus:age	3.3140
7	age	lstat	2.3533	946	lstat:age	3.3140
8	age	nox	3.6031	775	nox:age	3.3140
9	age	ptratio	3.5794	775	ptratio:age	3.3140
10	age	rad	3.6071	480	rad:age	3.3140

Showing 1 to 10 of 169 entries

Previous1 2 3 4 5…17Next

Prediction on a grid

The plots below show predictions of the random forest depending on values of components of an interaction (the values of remaining predictors are sampled from their empirical distribution) for up to 3 most frequent interactions that consist of two numerical variables.