Data Guide¶

This guide focus is a high-level “data dictionary” of files generated within the CML and supported by the CMLReaders package. For more information on the locations of the files, see cmlreaders.constants. In the table below, each row corresponds to a particular file type that is supported by the CMLReaders package. The table contains the following fields:

Data Type: Identifier that can be used for loading/interacting with the data
Description: Short description of the information contained in the file
Format: The format of the underlying file
Generated By: A link to the pipeline/code that produces the file
Used By: One or more links to code repositories or processing pipelines that use, or are otherwise dependent upon, this file.
Level: The level, at which, the file is guaranteed to be unique. For example, a level of “subject” indicates that there is one file per subject regardless of localization or experiments/tasks completed.

The following “levels” are defined:

protocol: A synonym for “project” or “study.” For example, RAM data is part of the “r1” protocol, while scalp experiments are part of the “ltp” protocol.
subject: A single participant in a study. Typically, subject identifiers are 6 alpha-numeric characters in length, for example R1001P.
localization: Each time a subject undergoes surgery to add or remove electrodes, this is considered a new localization. Localizations are 0-indexed. Data at this level changes whenever there is a new localization,
montage: Any change to a subject’s montage results in a new montage number. Montage changes can occur when when the set of contacts used for recording changes. Note: Montage numbers do not reset in the case of a new localization.
session: Data that is unique to a single session of a particular experiment. This is the lowest level of data stored since localization/montage/etc. are fixed for the duration of a session.

Data Type	Description	Format	Generated By	Used By	Level
r1_index	Index of sessions completed across experiments and subjects within the R1 protocol	json	event_creation	ramutils	protocol
ltp_index	Index of sessions completed across experiments and subjects within the LTP protocol	json	event_creation		protocol
localization	Metadata for implanted electrodes. Includes coordinates from various spaces as well a locations based on different atlases for both contacts and pairs of contacts. Generated as part of the localization pipeline.	json	neurorad	ramutils, neurorad, brainviewer	localization
target_selection_table	Metadata for each contact. Used for selecting stimulation targets. Generated as part of the RAM reporting pipeline once a subject has completed a full set of record-only sessions. Used by the 3D brain visualization application.	csv	ramutils	ramutils, brain_viz_unity	montage
baseline_classifier	A serialized version of a RAM brain state classifier that has been trained using record-only data. When loaded, the trained model can be used to make out of sample predictions given new features (powers). Generated as part of the reporting pipeline and includes all channels by default.	zip	ramutils	classiflib, ramutils,	montage
electrode_coordinates	Table of contact metadata in long-form including contact name, type, freesurfer coordinates, atlas location, etc. Generated as part of the brain viz pipeline to be used in the 3D brain visualization.	csv	brainviewer	brain_viz_unity	montage
prior_stim_results	Table of previously used stimulation targets with the memory modulation results along with coordinates in freesurfer space that have been mapped into a particular subject’s space. Used by the 3D brain visualization application to display prior stim locations/results.	csv	brainviewer	brain_viz_unity	montage
voxel_coordinates	Metadata for implanted electrodes including voxel coordinates of each contact. Generated using Vox Tool as part of the broader localization pipeline. Used by the neurorad pipeline to generate coordinates in other spaces, atlas locations, and other metadata.	json	localization	neurorad	montage
jacksheet	Labels for all signals recorded from a subject. Typically this includes all EEG-related contacts, but can contain additional signals such as EKG or other reference channels. It is formmated as one label per line. It is used as part of the config generation pipeline.	txt	manual	bptools, ramutils	montage
area	Surface area for each implanted depth electrode. Typically this file is manually generated. It is used by the config generation pipeline.	txt	manual	ramutils	montage
leads	Channel numbers of all contacts. One channel per line. In the old reporting/config generation pipelines, channels were manually removed from this file in order to remove them from classifier training and config generation. In the new pipeline, this is done by using the classifier_excluded_leads file instead.	txt	localization		montage
good_leads	Channel numbers of all contacts not identified as being bad channels. One channel per line	txt	localization		montage
classifier_excluded_leads	Labels of contacts that should be excluded when training a classifier. One label per line. This is a manually-created file used by the config generation pipeline to allow arbitrary sets of contacts to be excluded from classifier training.	txt	localization	ramutils	montage
electrdode_categories	Lists contacts in the seizure onset zone, exhibiting frequent interictal activity, residing in a brain lesion, or labelled as bad/broken by the clinical staff. Produced near the end of a patient’s time in the EMU and is primarily useful for post-hoc analysis.	txt	manual		montage
matlab_bipolar_talstruct	Legacy MATLAB talstruct files containing bipolar contact metadata. This file has been replaced by the pairs.json file.	mat		brainviewer	montage
matlab_monopolar_talstruct	Legacy MATLAB talstruct files containing single contact metadata. This file has been replaced by contacts.json.	mat		brainviewer	montage
pairs	Metadata for neighboring contacts. Produced from localization.json in the current neurorad pipeline.	json	neurorad	ramutils	montage
contacts	Metadata for each implanted contact . Produced from localization.json in the current neurorad pipeline	json	neurorad	ramutils	montage
session_summary	Binary file containing information summarizing a particular session	h5	ramutils		session
classifier_summary	Binary file containing information summarizing the performance of a classifier in a given session in the case of stimulation experiments, or for a set of sessions for record-only experiments	h5	ramutils		session
math_summary	Binary file containing information summarizing math distractor task performance	h5	ramutils		session
all_events	All normalized events associated with a session. Produced as part of the event creation pipeline.	json	event_creation	ramutils	session
task_events	Normalized task-related events. This is a subset of all_events and is generated by event creation	json	event_creation	ramutils	session
math_events	Normalized math distractor events. This is also a subset of all_events and is generated by event creation	json	event_creation	ramutils	session
ps4_events	Normalized events related to PS4 sessions generated by the event creation pipeline	json	event_creation	ramutils	session
sources	Contains information about the eeg files for a particular session including the sample rate, start time of the recording, length of the recording, etc.	json	event_creation	event_creation	session
experiment_log	Log file produced by the task laptop for PyEPL experiments. This file should not be used in analyses.	csv		event_creation	session
session_log	Log file produced by the task laptop during a session. This file should not be used in analyses.	csv		event_creation	session
ramulator_session_folder	Timestamped folder containing files generated by Ramulator during a session. Not likely to be used frequently by data analysts.				session
event_log	Log file produced by Ramulator during a session. This file should not be used in analyses, but can be useful for debugging if something goes wrong with Ramulator.	json		event_creation	session
experiment_config	Configuration file used by Ramulator to run an experiment. This file is unlikely to be used by analysts, but is helpful when determining what parameters were used for a particular session.	json			session
raw_eeg	Binary file containing EEG data recorded during a session	h5			session
odin_config	Configuration file for the ENS. This file is unlikely to be used by analysts.	csv		ramutils	session
used_classifier	The serialized classifier that was used during the session. This classifier differs from the baseline_classifier in that it may contain fewer channels if an artifiact detection algorithm was employed in the session. To best re-create what happened in an experiment in post hoc analyses, this is the classifier that should be used.	zip		ramutils	session
excluded_pairs	Metadata for pairs of contacts that were rejected as part of an artifact detection algorithm	json		ramutils	session
all_pairs	Metadata for all pairs of contacts that were recorded during a session	json		ramutils	session