Data Guide¶
This guide focus is a high-level “data dictionary” of files generated within
the CML and supported by the CMLReaders package. For more information on the
locations of the files, see cmlreaders.constants
. In the table below,
each row corresponds to a particular file type that is supported by the
CMLReaders package. The table contains the following fields:
- Data Type: Identifier that can be used for loading/interacting with the data
- Description: Short description of the information contained in the file
- Format: The format of the underlying file
- Generated By: A link to the pipeline/code that produces the file
- Used By: One or more links to code repositories or processing pipelines that use, or are otherwise dependent upon, this file.
- Level: The level, at which, the file is guaranteed to be unique. For example, a level of “subject” indicates that there is one file per subject regardless of localization or experiments/tasks completed.
The following “levels” are defined:
- protocol: A synonym for “project” or “study.” For example, RAM data is part of the “r1” protocol, while scalp experiments are part of the “ltp” protocol.
- subject: A single participant in a study. Typically, subject identifiers are 6 alpha-numeric characters in length, for example R1001P.
- localization: Each time a subject undergoes surgery to add or remove electrodes, this is considered a new localization. Localizations are 0-indexed. Data at this level changes whenever there is a new localization,
- montage: Any change to a subject’s montage results in a new montage number. Montage changes can occur when when the set of contacts used for recording changes. Note: Montage numbers do not reset in the case of a new localization.
- session: Data that is unique to a single session of a particular experiment. This is the lowest level of data stored since localization/montage/etc. are fixed for the duration of a session.
Data Type | Description | Format | Generated By | Used By | Level |
r1_index | Index of sessions completed across experiments and subjects within the R1 protocol | json | event_creation | ramutils | protocol |
ltp_index | Index of sessions completed across experiments and subjects within the LTP protocol | json | event_creation | protocol | |
localization | Metadata for implanted electrodes. Includes coordinates from various spaces as well a locations based on different atlases for both contacts and pairs of contacts. Generated as part of the localization pipeline. | json | neurorad | ramutils, neurorad, brainviewer | localization |
target_selection_table | Metadata for each contact. Used for selecting stimulation targets. Generated as part of the RAM reporting pipeline once a subject has completed a full set of record-only sessions. Used by the 3D brain visualization application. | csv | ramutils | ramutils, brain_viz_unity | montage |
baseline_classifier | A serialized version of a RAM brain state classifier that has been trained using record-only data. When loaded, the trained model can be used to make out of sample predictions given new features (powers). Generated as part of the reporting pipeline and includes all channels by default. | zip | ramutils | classiflib, ramutils, | montage |
electrode_coordinates | Table of contact metadata in long-form including contact name, type, freesurfer coordinates, atlas location, etc. Generated as part of the brain viz pipeline to be used in the 3D brain visualization. | csv | brainviewer | brain_viz_unity | montage |
prior_stim_results | Table of previously used stimulation targets with the memory modulation results along with coordinates in freesurfer space that have been mapped into a particular subject’s space. Used by the 3D brain visualization application to display prior stim locations/results. | csv | brainviewer | brain_viz_unity | montage |
voxel_coordinates | Metadata for implanted electrodes including voxel coordinates of each contact. Generated using Vox Tool as part of the broader localization pipeline. Used by the neurorad pipeline to generate coordinates in other spaces, atlas locations, and other metadata. | json | localization | neurorad | montage |
jacksheet | Labels for all signals recorded from a subject. Typically this includes all EEG-related contacts, but can contain additional signals such as EKG or other reference channels. It is formmated as one label per line. It is used as part of the config generation pipeline. | txt | manual | bptools, ramutils | montage |
area | Surface area for each implanted depth electrode. Typically this file is manually generated. It is used by the config generation pipeline. | txt | manual | ramutils | montage |
leads | Channel numbers of all contacts. One channel per line. In the old reporting/config generation pipelines, channels were manually removed from this file in order to remove them from classifier training and config generation. In the new pipeline, this is done by using the classifier_excluded_leads file instead. | txt | localization | montage | |
good_leads | Channel numbers of all contacts not identified as being bad channels. One channel per line | txt | localization | montage | |
classifier_excluded_leads | Labels of contacts that should be excluded when training a classifier. One label per line. This is a manually-created file used by the config generation pipeline to allow arbitrary sets of contacts to be excluded from classifier training. | txt | localization | ramutils | montage |
electrdode_categories | Lists contacts in the seizure onset zone, exhibiting frequent interictal activity, residing in a brain lesion, or labelled as bad/broken by the clinical staff. Produced near the end of a patient’s time in the EMU and is primarily useful for post-hoc analysis. | txt | manual | montage | |
matlab_bipolar_talstruct | Legacy MATLAB talstruct files containing bipolar contact metadata. This file has been replaced by the pairs.json file. | mat | brainviewer | montage | |
matlab_monopolar_talstruct | Legacy MATLAB talstruct files containing single contact metadata. This file has been replaced by contacts.json. | mat | brainviewer | montage | |
pairs | Metadata for neighboring contacts. Produced from localization.json in the current neurorad pipeline. | json | neurorad | ramutils | montage |
contacts | Metadata for each implanted contact . Produced from localization.json in the current neurorad pipeline | json | neurorad | ramutils | montage |
session_summary | Binary file containing information summarizing a particular session | h5 | ramutils | session | |
classifier_summary | Binary file containing information summarizing the performance of a classifier in a given session in the case of stimulation experiments, or for a set of sessions for record-only experiments | h5 | ramutils | session | |
math_summary | Binary file containing information summarizing math distractor task performance | h5 | ramutils | session | |
all_events | All normalized events associated with a session. Produced as part of the event creation pipeline. | json | event_creation | ramutils | session |
task_events | Normalized task-related events. This is a subset of all_events and is generated by event creation | json | event_creation | ramutils | session |
math_events | Normalized math distractor events. This is also a subset of all_events and is generated by event creation | json | event_creation | ramutils | session |
ps4_events | Normalized events related to PS4 sessions generated by the event creation pipeline | json | event_creation | ramutils | session |
sources | Contains information about the eeg files for a particular session including the sample rate, start time of the recording, length of the recording, etc. | json | event_creation | event_creation | session |
experiment_log | Log file produced by the task laptop for PyEPL experiments. This file should not be used in analyses. | csv | event_creation | session | |
session_log | Log file produced by the task laptop during a session. This file should not be used in analyses. | csv | event_creation | session | |
ramulator_session_folder | Timestamped folder containing files generated by Ramulator during a session. Not likely to be used frequently by data analysts. | session | |||
event_log | Log file produced by Ramulator during a session. This file should not be used in analyses, but can be useful for debugging if something goes wrong with Ramulator. | json | event_creation | session | |
experiment_config | Configuration file used by Ramulator to run an experiment. This file is unlikely to be used by analysts, but is helpful when determining what parameters were used for a particular session. | json | session | ||
raw_eeg | Binary file containing EEG data recorded during a session | h5 | session | ||
odin_config | Configuration file for the ENS. This file is unlikely to be used by analysts. | csv | ramutils | session | |
used_classifier | The serialized classifier that was used during the session. This classifier differs from the baseline_classifier in that it may contain fewer channels if an artifiact detection algorithm was employed in the session. To best re-create what happened in an experiment in post hoc analyses, this is the classifier that should be used. | zip | ramutils | session | |
excluded_pairs | Metadata for pairs of contacts that were rejected as part of an artifact detection algorithm | json | ramutils | session | |
all_pairs | Metadata for all pairs of contacts that were recorded during a session | json | ramutils | session |