ADs: Active Data-selection for Data Quality Assurance in Data-sharing over Advanced Manufacturing Systems
Machine learning (ML) methods are widely used in manufacturing applications, which usually require a large amount of training data. However, data collection needs extensive costs and time investments in the manufacturing system, and data scarcity commonly exists. With the development of the industrial internet of things (IIoT), data-sharing is widely enabled among multiple machines with similar functionality to augment the dataset for building ML models. Despite the machines being designed similarly, the distribution mismatch inevitably exists in their data. However, the effective application of ML methods is built upon the assumption that the training and testing data are sampled from the same distribution. Thus, an intelligent data-sharing framework is needed to ensure the quality of the shared data such that only beneficial information is shared to improve the performance of ML methods. In this work, we propose an Active Data-selection (ADs) framework to ensure the quality of the shared data among multiple machines. It is designed as a self-supervised learning framework by integrating the architecture of contrastive learning (CL) and active learning (AL). A novel acquisition function is developed for active learning by integrating the information measure for benefiting down-stream tasks and the similarity score for data quality assurance. To validate the ADs framework, we did simulation study and case study using collected real-world in-situ monitoring data. With a high-quality dataset queried by our proposed framework, the proposed ADs consistently outperforms the classical AL method and is comparable to or even better than the ML trained on fully labeled data.
Author(s):
Yue Zhao | Research assistant | Rensselaer Polytechnic Institute
Xuyuan Li | Assistant Professor | East China University of Science and Technology
Chenang Liu | Assistant Professor | Oklahoma State University
Yinan Wang | Assistant Professor | Rensselaer Polytechnic Institute
ADs: Active Data-selection for Data Quality Assurance in Data-sharing over Advanced Manufacturing Systems
Category
Abstract Submission
Description
Primary Track: Quality Control & Reliability EngineeringSecondary Track: Data Analytics and Information Systems
Primary Audience: Academician