API¶

CML Reader¶

class cmlreaders.CMLReader(subject: str, experiment: Union[str, NoneType] = None, session: Union[int, NoneType] = None, localization: Union[int, NoneType] = None, montage: Union[int, NoneType] = None, rootdir: Union[str, NoneType] = None)[source]¶

Generic reader for all CML-specific files

Notes

At import, all the readers from cmlreaders.readers will register the data types that should correspond to that reader by updating the reader_names dictionary. reader_names is a dict whose keys are one of the data types understood by cmlreaders.PathFinder and defined in cmlreaders.constants. Values are the name of the reader class that should be used for loading/reading the data type. When an instance of cmlreaders.cmlreader.CMLReader is instantiated, a new dictionary is created that maps the data types to the actual reader class, rather than just the class name. In essence, cmlreaders.cmlreader.CMLReader is a factory that routes the requests for loading a particular data type to the reader defined to handle that data.

static get_data_index(protocol: str = 'all', rootdir: Union[str, NoneType] = None) → pandas.core.frame.DataFrame[source]¶: Shortcut for the global get_data_index() function to only need to import CMLReader.

get_reader(data_type)[source]¶

Return an instance of the reader class for the given data type.

Notes

Reader instances get cached via functools.lru_cache().

load(data_type: str, **kwargs)[source]¶

Load requested data into memory.

Parameters:	data_type – Type of data to load (see `readers` for available options)

Notes

Keyword arguments that are accepted depend on the type of data being loaded. See load_eeg() for details.

load_eeg(events: Union[pandas.core.frame.DataFrame, NoneType] = None, rel_start: int = None, rel_stop: int = None, scheme: Union[pandas.core.frame.DataFrame, NoneType] = None)[source]¶

Load EEG data.

Keyword Arguments:
	events – Events to load EEG epochs from. Incompatible with passing `epochs`. rel_start – Start time in ms relative to passed event onsets. This parameter is required when passing events and not used otherwise. rel_stop – Stop time in ms relative to passed event onsets. This parameter is required when passing events and not used otherwise. scheme – When specified, a bipolar scheme to rereference the data with and/or filter by channel. Rereferencing is only possible if the data were recorded in monopolar (a.k.a. common reference) mode.
Returns:
Return type:	EEGContainer
Raises:	`RereferencingNotPossibleError` – When passing `scheme` and the data do not support rereferencing. `IncompatibleParametersError` – When both `events` and `epochs` are specified or `events` are used without passing `rel_start` and/or `rel_stop`.

classmethod load_events(subjects: Union[str, typing.List[str], NoneType] = None, experiments: Union[str, typing.List[str], NoneType] = None, rootdir: Union[str, NoneType] = None) → pandas.core.frame.DataFrame[source]¶

Load events from multiple sessions.

Parameters:	subjects – Subject or list of subjects. experiments – Experiment or list of experiments to include. rootdir – Path to root data directory.

localization¶: Determine the localization number.

montage¶: Determine the montage number.

path_finder¶: Return a path finder using the proper kwargs.

Unit conversions¶

cmlreaders.convert.events_to_epochs(events: pandas.core.frame.DataFrame, rel_start: int, rel_stop: int, sample_rate: Union[int, float], basenames: Union[typing.List[str], NoneType] = None) → List[Tuple[[int, int], int]][source]¶

Convert events to epochs.

Parameters:	events – Events to read. rel_start – Start time relative to events in ms. rel_stop – Stop time relative to events in ms. sample_rate – Sample rate in Hz. basenames – EEG file basenames.
Returns:	A list of tuples giving absolute start and stop times in number of samples.
Return type:	epochs

cmlreaders.convert.milliseconds_to_events(onsets: List[Union[int, float]], sample_rate: Union[int, float]) → pandas.core.frame.DataFrame[source]¶

Take times and produce a minimal events pd.DataFrame to load EEG data with.

Parameters:	onsets – Onset times in ms. sample_rate – Sample rate in samples per second.
Returns:	A `pd.DataFrame` with `eegoffset` as the only column.
Return type:	events

cmlreaders.convert.milliseconds_to_samples(millis: Union[int, float], sample_rate: Union[int, float]) → int[source]¶

Covert times in milliseconds to number of samples.

Parameters:	millis – Time in ms. sample_rate – Sample rate in samples per second.
Returns:
Return type:	Number of samples.

cmlreaders.convert.samples_to_milliseconds(samples: int, sample_rate: Union[int, float]) → Union[int, float][source]¶

Convert samples to milliseconds.

Parameters:	samples – Number of samples. sample_rate – Sample rate in samples per second.
Returns:
Return type:	Samples converted to milliseconds.

Custom Readers¶

class cmlreaders.readers.readers.BaseCSVReader(data_type: str, subject: Union[str, NoneType] = None, experiment: Union[str, NoneType] = None, session: Union[int, NoneType] = None, localization: Union[int, NoneType] = 0, montage: Union[int, NoneType] = 0, file_path: Union[str, NoneType] = None, eeg_basename: Union[str, NoneType] = None, rootdir: Union[str, NoneType] = None)[source]¶

Base class for reading CSV files.

as_dataframe()[source]¶: Return data as dataframe

class cmlreaders.readers.readers.BaseJSONReader(data_type: str, subject: Union[str, NoneType] = None, experiment: Union[str, NoneType] = None, session: Union[int, NoneType] = None, localization: Union[int, NoneType] = 0, montage: Union[int, NoneType] = 0, file_path: Union[str, NoneType] = None, eeg_basename: Union[str, NoneType] = None, rootdir: Union[str, NoneType] = None)[source]¶

Generic reader class for loading simple JSON files.

Returns a pd.DataFrame.

as_dataframe()[source]¶: Return data as dataframe

class cmlreaders.readers.readers.ClassifierContainerReader(data_type, subject, experiment, session, localization, file_path=None, rootdir='/', **kwargs)[source]¶

Reader class for loading a serialized classifier classifier

Notes

By default, a classiflib.container.ClassifierContainer class is returned.

as_dataframe()[source]¶: Return data as dataframe

class cmlreaders.readers.readers.EventReader(data_type: str, subject: Union[str, NoneType] = None, experiment: Union[str, NoneType] = None, session: Union[int, NoneType] = None, localization: Union[int, NoneType] = 0, montage: Union[int, NoneType] = 0, file_path: Union[str, NoneType] = None, eeg_basename: Union[str, NoneType] = None, rootdir: Union[str, NoneType] = None)[source]¶

Reader for all experiment events.

Returns a pd.DataFrame.

as_dataframe()[source]¶: Return data as dataframe

class cmlreaders.readers.readers.MNICoordinatesReader(data_type: str, subject: str, **kwargs)[source]¶

as_dataframe()[source]¶: Return data as dataframe

class cmlreaders.readers.readers.RAMCSVReader(data_type, subject, localization, experiment=None, file_path=None, rootdir='/', **kwargs)[source]¶: CSV reader type for RAM data.

class cmlreaders.readers.readers.RamulatorEventLogReader(data_type, subject, experiment, session, file_path=None, rootdir='/', **kwargs)[source]¶

Reader for Ramulator event log

as_dataframe()[source]¶: Return data as dataframe

as_dict()[source]¶: Return data as a list of dictionaries

class cmlreaders.readers.readers.TextReader(data_type: str, subject: str, **kwargs)[source]¶

Generic reader class for reading RAM text files

as_dataframe()[source]¶: Return data as dataframe

class cmlreaders.readers.eeg.BaseEEGReader(filename: str, dtype: Type[numpy.dtype], epochs: List[Tuple[int, Union[int, NoneType]]], scheme: Union[pandas.core.frame.DataFrame, NoneType])[source]¶

Base class for actually reading EEG data. Subclasses will be used by EEGReader to actually read the format-specific EEG data.

Parameters:	filename – Base name for EEG file(s) including absolute path dtype – numpy dtype to use for reading data epochs – Epochs to include. Epochs are defined with start and stop sample counts. scheme – Scheme data to use for rereferencing/channel filtering. This should be loaded/manipulated from `pairs.json` data.

Notes

The read() method must be implemented by subclasses to return a tuple containing a 3-D array with dimensions (epochs x channels x time) and a list of contact numbers.

include_contact(contact_num: int)[source]¶: Filter to determine if we need to include a contact number when reading data.

read() → Tuple[numpy.ndarray, List[int]][source]¶: Read the data.

rereference(data: numpy.ndarray, contacts: List[int]) → Tuple[numpy.ndarray, List[str]][source]¶

Rereference and/or select a subset of raw channels.

Parameters:

data – Input timeseries data shaped as (epochs, channels, time).
contacts – List of contact numbers (1-based) that index the data.

Returns:

reref – Rereferenced timeseries.
labels – List of channel labels used (included in case some don’t get used).

Notes

This method is meant to be used when loading data and so returns a raw Numpy array. If used externally, a EEGContainer will need to be constructed manually.

scheme_type¶

Returns “contacts” when the input scheme is in the form of monopolar contacts and “pairs” when bipolar.

Returns:
Return type:	The scheme type or `None` is no scheme was specified.
Raises:	`KeyError` – When the passed scheme doesn’t include any of the following keys: `contact_1`, `contact_2`, `contact`

class cmlreaders.readers.eeg.EDFReader(filename: str, dtype: Type[numpy.dtype], epochs: List[Tuple[int, Union[int, NoneType]]], scheme: Union[pandas.core.frame.DataFrame, NoneType])[source]¶

read() → Tuple[numpy.ndarray, List[int]][source]¶: Read the data.

class cmlreaders.readers.eeg.EEGMetaReader(data_type: str, subject: Union[str, NoneType] = None, experiment: Union[str, NoneType] = None, session: Union[int, NoneType] = None, localization: Union[int, NoneType] = 0, montage: Union[int, NoneType] = 0, file_path: Union[str, NoneType] = None, eeg_basename: Union[str, NoneType] = None, rootdir: Union[str, NoneType] = None)[source]¶

Reads the sources.json or params.txt files which describes metainfo about EEG data.

EEGMetaReader uses the following logic to combine entries in sources.json:

If all recordings in sources.json have the same value for a field, then the dictionary returned by EEGMetaReader has that value for the field
Otherwise, that field should be populated by a list of the values present in sources.json

as_dict() → dict[source]¶: Return data as a list of dictionaries

class cmlreaders.readers.eeg.EEGReader(data_type: str, subject: Union[str, NoneType] = None, experiment: Union[str, NoneType] = None, session: Union[int, NoneType] = None, localization: Union[int, NoneType] = 0, montage: Union[int, NoneType] = 0, file_path: Union[str, NoneType] = None, eeg_basename: Union[str, NoneType] = None, rootdir: Union[str, NoneType] = None)[source]¶

Reads EEG data.

Returns a EEGContainer.

Examples

All examples start by defining a reader:

>>> from cmlreaders import CMLReader
>>> reader = CMLReader("R1111M", experiment="FR1", session=0)

Loading a subset of EEG based on brain region (this automatically re-references):

>>> pairs = reader.load("pairs")
>>> filtered = pairs[pairs["avg.region"] == "middletemporal"]
>>> eeg = reader.load_eeg(scheme=pairs)

Loading EEG from -100 ms to +100 ms relative to a set of events:

>>> events = reader.load("events")
>>> eeg = reader.load_eeg(events, rel_start=-100, rel_stop=100)

Loading an entire session:

>>> eeg = reader.load_eeg()

Loading multiple sessions from the same subject:

>>> events = CMLReader.load_events(["R1111M"], ["FR1"])
>>> words = events[events["type"] == "WORD"]
>>> reader = CMLReader("R1111M")
>>> eeg = reader.load_eeg(events=words, rel_start=-100, rel_stop=100)

as_dataframe()[source]¶: Return data as dataframe

as_dict()[source]¶: Return data as a list of dictionaries

as_recarray()[source]¶: Return data as a numpy recarray. By default, this calls as_dataframe() and converts to a recarray with pd.DataFrame.to_records().

as_timeseries(events: pandas.core.frame.DataFrame, rel_start: Union[float, int], rel_stop: Union[float, int]) → cmlreaders.eeg_container.EEGContainer[source]¶

Read the timeseries.

Parameters:

events – Events to read EEG data from
rel_start – Relative start times in ms
rel_stop – Relative stop times in ms

Returns:

A time series with shape (channels, epochs, time). By default, this
returns data as it was physically recorded (e.g., if recorded with a
common reference, each channel will be a contact’s reading referenced to
the common reference, a.k.a. “monopolar channels”).

Raises:

RereferencingNotPossibleError – When rereferencing is not possible.

load(**kwargs)[source]¶: Overrides the generic load method so as to accept keyword arguments to pass along to as_timeseries().

class cmlreaders.readers.eeg.NumpyEEGReader(filename: str, dtype: Type[numpy.dtype], epochs: List[Tuple[int, Union[int, NoneType]]], scheme: Union[pandas.core.frame.DataFrame, NoneType])[source]¶

Read EEG data stored in Numpy’s .npy format.

Notes

This reader is currently only used to do some testing so lacks some features such as being able to determine what contact numbers it’s actually using. Instead, it will just give contacts as a sequential list of ints.

read() → Tuple[numpy.ndarray, List[int]][source]¶: Read the data.

class cmlreaders.readers.eeg.RamulatorHDF5Reader(filename: str, dtype: Type[numpy.dtype], epochs: List[Tuple[int, Union[int, NoneType]]], scheme: Union[pandas.core.frame.DataFrame, NoneType])[source]¶

Reads Ramulator HDF5 EEG files.

read() → Tuple[numpy.ndarray, List[int]][source]¶: Read the data.

rereference(data: numpy.ndarray, contacts: List[int]) → Tuple[numpy.ndarray, List[str]][source]¶: Overrides the default rereferencing to first check validity of the passed scheme or if rereferencing is even possible in the first place.

class cmlreaders.readers.eeg.SplitEEGReader(filename: str, dtype: Type[numpy.dtype], epochs: List[Tuple[int, Union[int, NoneType]]], scheme: Union[pandas.core.frame.DataFrame, NoneType])[source]¶

Read so-called split EEG data (that is, raw binary data stored as one channel per file).

read() → Tuple[numpy.ndarray, List[int]][source]¶: Read the data.

PathFinder¶

The cmlreaders.PathFinder class can be used to identify the location of various file types on RHINO. In an ideal world, all historic data would be processed to have consistent file names, locations, and types. However, because this has not been the case and individuals analyzing the data have come to expect and deal with these inconsistencies, the safer approach is to leave the data in its original form and attempt to abstract away these underlying inconsistencies for future users.

class cmlreaders.PathFinder(subject: Union[str, NoneType] = None, experiment: Union[str, NoneType] = None, session: Union[int, NoneType] = None, localization: Union[int, NoneType] = 0, montage: Union[int, NoneType] = 0, eeg_basename: Union[str, NoneType] = None, rootdir: Union[str, NoneType] = None)[source]¶

find(data_type)[source]¶

Given a specific file type, find the corresponding file on RHINO and return the full path

Parameters:	file_type (The type of file to load. The given name should match one of) – the keys from rhino_paths
Returns:	path – The path of the file found based on the request
Return type:	str

localization_files¶: All localization related files

montage_files¶: All files that vary by montage number

requestable_files¶: All files that can be requested with PathFinder.find()

session_files¶: All files that vary by session

subject_files¶: All files that vary only by subject

Path and File Constants¶

cmlreaders.PathFinder internally uses the cmlreaders.constants module. The usefulness of cmlreaders.PathFinder relies on these constants being well-maintained.