Definition Random forest

Random forest is a popular machine-learning algorithm used for regression and classification tasks. The algorithm is based on the creation of "many" decision trees that are each based on the outcomes of the independent variables and estimate the most likely realization of the dependent variable. The combination of all these single decision trees (or, the forest) gives the prediction for the dependent variable.

To assess the validity of options in a random forest application, samples are typically split into two parts, a training sample and a testing sample. The model is trained (or, the model's coefficients are calculated) using the observations included in the training sample. The developed model is then used to calculate the dependent variable's outcome in the testing sample. If the predicted outcome closely resembles the one that was observed in the testing sample, then the model is considered to be reliable.

Please note that the definitions in our statistics encyclopedia are simplified explanations of terms. Our goal is to make the definitions accessible for a broad audience; thus it is possible that some definitions do not adhere entirely to scientific standards.