Data pre-processing techniques generally refer to the addition, deletion, or transformation of training set data.
Although this text is primarily concerned with modeling techniques, data preparation can make or break a model’s predictive ability. Different models have different sensitivities to the type of predictors in the model; how the predictors enter the model is also important. Transformations of the data to reduce the impact of data skewness or outliers can lead to significant improvements in performance. Feature extraction is one empirical technique for creating surrogate variables that are combinations of multiple predictors.
Additionally, simpler strategies such as removing predictors based on their lack of information content can also be effective.