Training Data
Training data is the historical, often labelled, dataset used to teach a machine-learning model the patterns it will later apply. Its quality, quantity and representativeness largely determine how well the model performs.
A model can only be as good as the data it learns from. In industrial applications, training data is typically drawn from historians and maintenance records, and getting enough examples of rare failures is a perennial challenge. Data must be cleaned, aligned in time, labelled with confirmed outcomes and made representative of the conditions the model will face. Biased or sparse training data leads to brittle models, and changing conditions later cause model drift.