bike_all % select(-casual, -registered) # Original skewness(bike_all$total) # 1.277301 # Log skewness(log10(bike_all$total)) # -0.936101 # Log + constant skewness(log1p(bike_all$total)) # -0.8181098 # Square root skewness(sqrt(bike_all$total)) # 0.2864499 # Cubic root skewness(bike_all$total^(1 / 3)) # -0.0831688 # Transform with cubic root bike_all$total <- bike_all$total^(1 / 3) PredictorsĬategorical variables are converted to factors according to the attribute information provided by UCI. I tried several common techniques for positively skewed data and applied the one with the lowest skewness - cubic root. As suggested earlier, the target variable is positively skewed and requires transformation. I focused on the total count, so casual and registered variables are moved. However, for normalisation, I need to know the minimum and the maximum value of a variable, both of which might be different for training and testing. For example, if I take the square root of a number, I can square it to know the original number. Here, I focus on the process that applies to all data and does not have a parameter, such as factorising or simple mathematic calculation. Since I have not split the data yet, this step is not data scaling or centring, which should fit the training set and transform the testing set. However, for a beginner, it might be intimidating (at least it was for me). This is beneficial for users because of the increased flexibility and possibility. As shown, tidymodels breaks down the machine learning workflow into multiple stages and provides specialised packages for each stage.
Some common libraries from tidyverse, such as dplyr, are also loaded.
workflow: for putting everything together.parsnip: for trying out a range of models.rsample: for data splitting and resampling.
When I execute the library(tidymodels) command, the following packages are loaded: Tidymodels is a collection of packages for modelling. For example, createDataPartition for splitting data and trainControl for setting up cross-validation. Caret is a single package with various functions for machine learning.