For this type of problem, where the input (sensor readings) and output (remaining useful life) is known, the type of algorithm to be applied falls under the supervised learning category. In supervised learning, the task of the algorithm is to learn a function by mapping the input, which is labeled data, to the output. Among the available options, this simulation tested GLM, Gradient Tree Boosting, and Random Forests.
In addition to choosing an algorithm, there are other parameters to take into account when dealing with machine learning. These are noise, a sufficient amount of data, overfitting and bias-variance tradeoff. In this particular case noise is not an issue as most of the data is independent from human errors and the sensors are quite reliable. The quantity of data is not an issue as well since there is an extensive amount gathered in the dataset. Overfitting can be minimized by a process called cross-validation which consists of splitting the data-set into training and testing sets, making the algorithm test its performance for different data combinations.
Bias-variance tradeoff is a common problem, where normally mathematical models are tweaked to increase the bias in order to better explain the variance in the data – this means that an algorithm should be flexible enough to fit data (less flexibility implies more bias) without showing high variance (higher the variance, the worse the model will perform). A balance between these factors needs to be achieved in order to obtain a reliable model that can accurately predict outcomes but also be resilient regarding “unexpected” data values.