The following instructions assume you have the enhanced gestures GT and the videos_crop.txt file, both given at separate links at the website.
	-'gestures' folder: include 486 text files, each includes gesture ground truth for one video.
	-'videos_crop.txt': include the cropping window coordinates and the start&end timestamps of each video.			

How to read:
1. GT gesture file: 
	the file is named with PARTICIPANT_COURSE.txt.
	Columns from left to right are:
	1) gesture type, avaliable types are bite, drink, non-intake
	2) the gesture start timestamp in millisecond, from the beginning of the video;
	3) the gesture end timestamp in millisecond, from the beginning of the video;
2. 'video_info.txt':
	Columns from left to right are:
	1) video name: in PARTICIPANT_COURSE format;
	2) x coordinate of the top-left corner of the cropping window;
	3) y coordinate of the top-left corner of the cropping window;
	4) x coordinate of the bottom-right corner of the cropping window;
	5) y coordinate of the bottom-right corner of the cropping window;
	6) the start timestamp of the meal in millisecond, from the beginning of the video;
	6) the end timestamp of the meal in millisecond, from the beginning of the video.
*Notes:
	1) all timestamps are counted from the beginning of the video recording.
	2) all coordinates are in the frame coordinate system, where the origin is the top-left corner of the frame.
	3) numbers in each line are seperated by a tab.

To convert the video timestamps to frame index starting from the video's beginning:
frame_idx =  timestamp / 1000 * sampling_frame_per_second

To convert the timestamp in the video gesture gt files to the datapoint index in the original gesture ground truth: 
	ori_datapoint_idx = (video_timestamp - video_sync_offset) / 1000 * 15
where:
	15 is the original sampling frequency as in wrist motion and scale data;
	video_sync_offset is the offset between the start time of video and wrist motion sensor;
	The offset value is the first value in *_sync.txt file under the same directory as the video file.


Example:
the first line in gestures/p005_c2.txt (Partcipant index is p005, course index is c2):
bite	41207	43941
That means the first intake gesture in the video located in Cafeteria/p005/c2/ is bite.
The start and end time of the gesture are 41207 ms and 43941 ms, timing from the start of the video.

To sample the video with 8 Hz:
the start frame index: 41207 / 1000 * 8 = 330
the end frame index: 43941 / 1000 * 8 = 352


To convert the timestamps to the datapoint index in the original gesture ground truth:
the start datapoint index: (41207 - 1341)/ 1000 * 15 = 598
the end datapoint index: (43941 - 1341) / 1000 * 15 = 639
(By looking into the 20120201115556861_sync.txt under Cafeteria/p005/c2/, the offset value is 1341 (millisecond))
These datapoint indexes are exactly indentical with those in Cafeteria/p005/c2/gesture_union.txt.