Data Guide

This guide focus is a high-level “data dictionary” of files generated within the CML and supported by the CMLReaders package. For more information on the locations of the files, see cmlreaders.constants. In the table below, each row corresponds to a particular file type that is supported by the CMLReaders package. The table contains the following fields:

  • Data Type: Identifier that can be used for loading/interacting with the data
  • Description: Short description of the information contained in the file
  • Format: The format of the underlying file
  • Generated By: A link to the pipeline/code that produces the file
  • Used By: One or more links to code repositories or processing pipelines that use, or are otherwise dependent upon, this file.
  • Level: The level, at which, the file is guaranteed to be unique. For example, a level of “subject” indicates that there is one file per subject regardless of localization or experiments/tasks completed.

The following “levels” are defined:

  • protocol: A synonym for “project” or “study.” For example, RAM data is part of the “r1” protocol, while scalp experiments are part of the “ltp” protocol.
  • subject: A single participant in a study. Typically, subject identifiers are 6 alpha-numeric characters in length, for example R1001P.
  • localization: Each time a subject undergoes surgery to add or remove electrodes, this is considered a new localization. Localizations are 0-indexed. Data at this level changes whenever there is a new localization,
  • montage: Any change to a subject’s montage results in a new montage number. Montage changes can occur when when the set of contacts used for recording changes. Note: Montage numbers do not reset in the case of a new localization.
  • session: Data that is unique to a single session of a particular experiment. This is the lowest level of data stored since localization/montage/etc. are fixed for the duration of a session.
Data Type Description Format Generated By Used By Level
r1_index Index of sessions completed across experiments and subjects within the R1 protocol json event_creation ramutils protocol
ltp_index Index of sessions completed across experiments and subjects within the LTP protocol json event_creation   protocol
localization Metadata for implanted electrodes. Includes coordinates from various spaces as well a locations based on different atlases for both contacts and pairs of contacts. Generated as part of the localization pipeline. json neurorad ramutils, neurorad, brainviewer localization
target_selection_table Metadata for each contact. Used for selecting stimulation targets. Generated as part of the RAM reporting pipeline once a subject has completed a full set of record-only sessions. Used by the 3D brain visualization application. csv ramutils ramutils, brain_viz_unity montage
baseline_classifier A serialized version of a RAM brain state classifier that has been trained using record-only data. When loaded, the trained model can be used to make out of sample predictions given new features (powers). Generated as part of the reporting pipeline and includes all channels by default. zip ramutils classiflib, ramutils, montage
electrode_coordinates Table of contact metadata in long-form including contact name, type, freesurfer coordinates, atlas location, etc. Generated as part of the brain viz pipeline to be used in the 3D brain visualization. csv brainviewer brain_viz_unity montage
prior_stim_results Table of previously used stimulation targets with the memory modulation results along with coordinates in freesurfer space that have been mapped into a particular subject’s space. Used by the 3D brain visualization application to display prior stim locations/results. csv brainviewer brain_viz_unity montage
voxel_coordinates Metadata for implanted electrodes including voxel coordinates of each contact. Generated using Vox Tool as part of the broader localization pipeline. Used by the neurorad pipeline to generate coordinates in other spaces, atlas locations, and other metadata. json localization neurorad montage
jacksheet Labels for all signals recorded from a subject. Typically this includes all EEG-related contacts, but can contain additional signals such as EKG or other reference channels. It is formmated as one label per line. It is used as part of the config generation pipeline. txt manual bptools, ramutils montage
area Surface area for each implanted depth electrode. Typically this file is manually generated. It is used by the config generation pipeline. txt manual ramutils montage
leads Channel numbers of all contacts. One channel per line. In the old reporting/config generation pipelines, channels were manually removed from this file in order to remove them from classifier training and config generation. In the new pipeline, this is done by using the classifier_excluded_leads file instead. txt localization   montage
good_leads Channel numbers of all contacts not identified as being bad channels. One channel per line txt localization   montage
classifier_excluded_leads Labels of contacts that should be excluded when training a classifier. One label per line. This is a manually-created file used by the config generation pipeline to allow arbitrary sets of contacts to be excluded from classifier training. txt localization ramutils montage
electrdode_categories Lists contacts in the seizure onset zone, exhibiting frequent interictal activity, residing in a brain lesion, or labelled as bad/broken by the clinical staff. Produced near the end of a patient’s time in the EMU and is primarily useful for post-hoc analysis. txt manual   montage
matlab_bipolar_talstruct Legacy MATLAB talstruct files containing bipolar contact metadata. This file has been replaced by the pairs.json file. mat   brainviewer montage
matlab_monopolar_talstruct Legacy MATLAB talstruct files containing single contact metadata. This file has been replaced by contacts.json. mat   brainviewer montage
pairs Metadata for neighboring contacts. Produced from localization.json in the current neurorad pipeline. json neurorad ramutils montage
contacts Metadata for each implanted contact . Produced from localization.json in the current neurorad pipeline json neurorad ramutils montage
session_summary Binary file containing information summarizing a particular session h5 ramutils   session
classifier_summary Binary file containing information summarizing the performance of a classifier in a given session in the case of stimulation experiments, or for a set of sessions for record-only experiments h5 ramutils   session
math_summary Binary file containing information summarizing math distractor task performance h5 ramutils   session
all_events All normalized events associated with a session. Produced as part of the event creation pipeline. json event_creation ramutils session
task_events Normalized task-related events. This is a subset of all_events and is generated by event creation json event_creation ramutils session
math_events Normalized math distractor events. This is also a subset of all_events and is generated by event creation json event_creation ramutils session
ps4_events Normalized events related to PS4 sessions generated by the event creation pipeline json event_creation ramutils session
sources Contains information about the eeg files for a particular session including the sample rate, start time of the recording, length of the recording, etc. json event_creation event_creation session
experiment_log Log file produced by the task laptop for PyEPL experiments. This file should not be used in analyses. csv   event_creation session
session_log Log file produced by the task laptop during a session. This file should not be used in analyses. csv   event_creation session
ramulator_session_folder Timestamped folder containing files generated by Ramulator during a session. Not likely to be used frequently by data analysts.       session
event_log Log file produced by Ramulator during a session. This file should not be used in analyses, but can be useful for debugging if something goes wrong with Ramulator. json   event_creation session
experiment_config Configuration file used by Ramulator to run an experiment. This file is unlikely to be used by analysts, but is helpful when determining what parameters were used for a particular session. json     session
raw_eeg Binary file containing EEG data recorded during a session h5     session
odin_config Configuration file for the ENS. This file is unlikely to be used by analysts. csv   ramutils session
used_classifier The serialized classifier that was used during the session. This classifier differs from the baseline_classifier in that it may contain fewer channels if an artifiact detection algorithm was employed in the session. To best re-create what happened in an experiment in post hoc analyses, this is the classifier that should be used. zip   ramutils session
excluded_pairs Metadata for pairs of contacts that were rejected as part of an artifact detection algorithm json   ramutils session
all_pairs Metadata for all pairs of contacts that were recorded during a session json   ramutils session