\chapter{Conclusion}

% How well did your methods work?

\section{General Discussion}

In this thesis, the novel concept of analyzing a day-length recording to detect episodes of eating was introduced as well as an RNN-based daily pattern classifier for this task. The results of the experiments in this work answer the three original questions:
\begin{enumerate}[label=\arabic*., listparindent=1.5em]
\item Does analyzing the probability of eating in a daily context with a neural network improve eating episode classification?

The 6-minute window signal from the window-based eating classifier had a low SNR and an abundance of noise due to the variety of gestures and wrist motion throughout the day that can resemble brief periods of eating. These gestures include grooming, shaving, brushing teeth, adjusting glasses, touching the face, and even food preparation, among a multitude of others. However, when the data was re-analyzed with a daily window this added more context and increased the SNR as shown in figures \ref{fig:model-comparison1} - \ref{fig:model-comparison-fn}. The background response of the P(E$_w$) signal was significantly filtered while retaining the peaks for meals, which caused improved SNR. This is evidence that a neural network can learn daily contextual clues and utilize them for better eating detection. 

Using an RNN for this task had many advantages. First, since the problem was inherently time series based, a recurrent neural network approach was needed for per-timestep classification. Second, the memory of RNN neurons enabled greater understanding of the daily context, which was the focus of this work. Lastly, masking with time series data made it possible to only feed the classifier real data without creating unrealistic data with padded zeros or other values. Overall, an RNN-based classifier was ideal for the daily pattern classifier.

\item Can this approach reduce the number of false detections in eating episode detection?

Our day-level classifier created more separation between background noise in the signal and data that corresponded to actual meals. This distinction enabled better detection of eating episodes by mitigating transient responses and leading to fewer false detections. Furthermore, this permitted a shift from a dual-threshold hysteresis approach to a single-threshold approach for post-processing.

\item How do the results of this approach compare to those from a window-based classifier?

This approach exhibits similar or better results for every time and episode evaluation metric we measured when compared to the window-based classifier from~\cite{sharma2020}. Some performance metrics remained relatively unchanged, while others like the Acc$_W$, time TPR, and episode FP/TP increased by a larger margin. The largest improvement was a 53\% reduction in the number of FP/TP eating episodes. The greatest decline was a 4\% drop in episode TPR. All other metrics improved except time TNR, which remained constant between the two approaches.
\end{enumerate}

% What are the limitations?
\section{Limitations}

The most significant advantage of this approach is also its most significant limitation. The daily pattern classifier requires an entire day-length recording of data to operate. The daily frame of reference and contextual indicators are essential for this approach. As such, it can only work in a post-hoc fashion and is not designed for real-time use. The data used by the classifier is also not an end-to-end model that can process raw wrist motion IMU data. Data for this model must first be parsed with another model like the window-based eating model CNN approach from previous work~\cite{sharma2020} to generate a probability of eating signal. The daily pattern classifier and the windowed eating classifier are similar, but designed for two different approaches to the same problem. The windowed eating classifier is designed to be a real-time model and operate without the foresight of comprehensive patterns in the data. On the other hand, the daily pattern classifier leverages the latter for a model oriented for post-recording runtime.
\newpage
Another limitation of this study is that the daily pattern classifier was only tested with the Clemson All-Day (CAD) Dataset~\cite{cad2020} that includes over 350 day-length recordings from free-living participants. Albeit it is the largest dataset of its type known to us, the applicability to other datasets has not yet been investigated and may require changes to the model architecture or pre-processing.

% From these results, what do you suggest doing next?
\section{Future Work}

There are several opportunities for future research in this area. First and foremost, future work could look at incorporating both models (window-based and daily-context-based) into a single end-to-end encoder-decoder classifier. Although this model would be able to output a probability of eating from raw wrist motion data with the added benefit of daily context, it would also be an inherently post-hoc and not a real-time classifier. This is entirely due to the need for overall daily patterns that would not exist until an entire day of data existed as well. This classifier would also require more parameters, greater model complexity, and a much larger dataset to train end-to-end. A dataset of this size is not yet available.

The volatility of this model is also yet to be explored. Although only preliminary figures for quantifiable model volatility have been measured for the windowed eating classifier (see appendix \ref{app:model-vol}), it is apparent that there is a higher level of variability than expected. Whether this issue is arising from the amount of data, the nature of the classifier itself, how non-eating samples are selected for training, or another issue entirely remains to be seen. In any case, the volatility of this model should be evaluated more thoroughly. Furthermore, the same should be done for the daily pattern classifier as that is not investigated in this work. A comparison between the two would also be practical. 

Finally, once the issue of model volatility is researched and controlled, grouping participants by eating behavior could be beneficial. These styles or behaviors of eating are also known as eating phenotypes. Currently, the problem is that model volatility is too high to use any metric to precisely measure performance difference between groups. A change in performance could be due to model volatility or better/worse grouping, but with current fluctuations there is no easy way to differentiate the two. Previous research has looked into using individual models~\cite{wei2021} and large, full group models~\cite{sharma2020} for eating detection. This research would serve as a middle ground between the two by grouping people based on how they eat.
