Sensor-based Evaluation of Intermittent Fasting Regimes: A Machine Learning and Statistical Approach
In the pursuit of healthier lifestyles, intermittent fasting has emerged as a popular dietary strategy. However, its success hinges significantly on adherence to specific eating windows and fasting periods. This study tackles this challenge by developing and assessing various machine learning and statistical models utilizing sensor data to gauge adherence levels in intermittent fasting regimes, particularly focusing on eating duration and specific time-frames.
Data collection was executed across two distinct human trials that explored different intermittent fasting regimes. Details of the study design, eligibility criteria, and ethical approvals were published separately. The ChronoFast trial (ClinicalTrials: NCT04351672) and the ParoFastin trial (DKRS: DRKS00026701) provided rich data sources. ChronoFast compared early (8:00-16:00) versus late (13:00-21:00) time-restricted eating (TRE) from glucose readings collected over 14 days with 31 women participants. Meanwhile, ParoFastin investigated TRE (self-selected 8-hour window), religious dry fasting (before sunrise and after sunset), and habitual diets, tracking glucose values over 19 days with 16 subjects (See Fig. 1A for a concise representation).
The trials also captured extensive food logs and acceleration data, offering a comprehensive view of participants’ physical activity levels in the ChronoFast trial (illustrated in Fig. 1B).
Acceleration metrics were obtained using the wGT3X-BT sensor (ActiGraph, USA) mounted on participants’ non-dominant wrists. Processed through ActiLife software, the acceleration data included key variables such as timestamp, participant ID, and levels of physical activity. Glucose information was gathered using FreeStyle Libre sensors. These continuous glucose monitors recorded data at 15-minute intervals, providing vital information for dietary analysis.
Participants diligently logged their nutritional intake, guided initially by a nutritionist, using either the Fddb Extender app or paper-based records. This nutritional data, pivotal for analyzing fasting adherence, was compiled and exported for meticulous examination.
Crucially, adherence to fasting regimens was evaluated by assessing compliance to the prescribed 8-hour eating windows, particularly those adhering to meal timing regulations. Compliance permitted a 30-minute deviation, ensuring practical adherence assessment across both ChronoFast and ParoFastin trials.
In complement to the collected data, simulations were conducted using Simglucose, a Python package designed for simulating Type 1 Diabetes Mellitus time series data, enriching the research with hypothetical scenarios and device-specific configurations.
Data analysis leveraged Python-powered tools like Jupyter Notebook and PyCharm, importing the triad of data sources into harmonized data frames using the pandas library. Ensuring consistency, acceleration and glucose data were synchronized, and nutritional data was categorized into fasting and non-fasting states.
Focusing on the ChronoFast study, various models were developed to serve as precise binary predictors of fasting states. These models underwent rigorous evaluation using the ParoFastin data set and simulated Type 1 diabetes data, highlighting the robustness of the approach.
Feature computation transpired in two principal streams: glucose data assessed via the cgmquantify Python package, and acceleration data examined following established methods in activity recognition. Signal processing tools like scipy.signal were indispensable for peak identification within the data.
To streamline models while boosting performance, Recursive Feature Elimination (RFE) from the scikit-learn package spotlighted the most significant variables, as enumerated in Table 1.
Balancing the datasets was critical to circumvent label bias and overfitting during model training. Techniques from the Imbalanced-learn package, including SMOTETomek and SMOTE, were utilized to refine dataset balance and quality. Noteworthy was the employment of SMOTE for creating new samples through interpolation, ensuring robust data integrity.
Upon achieving balance, the data underwent a Min-Max scaling transformation, normalizing values between 0 and 1 to combat skewed distributions and outliers. Subsequent to scaling, several supervised machine learning models were evaluated. SHapley Additive exPlanations (SHAP) values provided vital insights into global model interpretability, elucidating variable contributions (illustrated in Fig. 1C).
The study culminated with an innovative data visualization tool, crafted using the Dash library, enabling multi-instance computer deployments. Visual representations were enhanced through Plotly, promising an interactive and intuitive user experience for extensive data exploration.
This cutting-edge research not only advances the understanding of intermittent fasting but also exemplifies the potential of utilizing sensor data and machine learning in nutrition science.