Serializable data structures

Defining data classes

Data can be defined in a serializable manner using the traitschema.Schema base class which adds serialization methods to data classes that are defined using the traits package. To ensure serializability, use the Array type whenever possible.

Experiment parameters

Experiment parameters (e.g., timing windows) are defined as traitschema.Schema subclasses so that the parameters used when training a classifier can be easily saved.

Common experimental/model parameters.

class ramutils.parameters.FilePaths(**kwargs)[source]

Paths to files that frequently get passed around to many tasks.

All paths given relative to the root path but are converted to absolute paths on creation.

Keyword Arguments:
 
  • root (str) – Rhino mount point.
  • dest (str) – Directory to write files to.
  • pairs (str) – Path to pairs.json.
  • excluded_pairs (str) – Path to excluded_pairs.json.
  • electrode_config_file (str) – Path to electrode config file.
  • area_file (str) – Path to surface area file. When generating Odin configuration files and not defined, the default behavior is to look in the same directory as the jacksheet for a file named area.txt.
  • data_db (str) – Path to directory where permanently-cached underlying data for reports should be stored. In general, this should only be specified when testing, otherwise the default location should be used
class ramutils.parameters.ExperimentParameters(**kwargs)[source]

Common parameters used in an experiment. Default values apply to the FR class of experiments.

class ramutils.parameters.FRParameters(**kwargs)[source]

Free recall experiment parameters relevant for classification.

class ramutils.parameters.PALParameters(**kwargs)[source]

Paired associates experiment parameters relevant for classification. It inhertis all of the same parameters as FR experiments and adds a few more

class ramutils.parameters.PS5Parameters(**kwargs)[source]

PS5 experiment parameters

class ramutils.parameters.StimParameters(**kwargs)[source]

Single-channel stimulation parameters.

Underlying Data

All data necessary to rebuild a report is saved in a binary format as part of generating the report. All data is dumped into a single directory with differentiation between subjects/sessions/data done by following a strict naming convention: {subject}_{experiment}_{session}_{data_type}.{file_type}. Most saved objects are unique to a particular subject/experiment/session. In cases where this is not true, {session} wil be an underscore-separated list of the sessions used to generate the data. To see how this is done, see ramutils.tasks.misc.save_all_output() and ramutils.tasks.misc.load_existing_results(). For example, if a target selection table was generated using sessions 1, 2, and 3 for subject R1XXX and experiment XYZ, then the file would be saved as R1XXX_XYZ_1_2_3_target_selection_table.csv. Listed below are the types of data stored. Their corresponding objects are also noted. The properties and methods defined for each of these objects can be found in the documentation below.

  • target_selection_table – A csv file containing metadata for each electrode
  • classifier_summary – Metadata related to classifier performance ClassifierSummary
  • math_summary – Math events and useful helper methods for assessing performance on the distractor task MathSummary
  • session_summary – Events and helper methods for conducting behavioral analyses and generating plots. In many cases, there are summary objects specific to the type of session, i.e. stim vs. nonstim, FR vs. CatFR vs. PS, etc.
class ramutils.reports.summary.ClassifierSummary(**kwargs)[source]

Classifier Summary Object

auc

Classifier AUC

classifier_activation

Forward model of classifier activation from Haufe et. al. 2014

confidence_interval_median_classifier_output

95% confidence interval for the median of the classifier output. Used as a sniff test for if something is amiss. Should be centered around 0.5

false_positive_rate

False positive rate used for AUC curve

high_tercile_diff_from_mean

% change in recall rate from overall recall when classifier output was in highest tercile

low_tercile_diff_from_mean

% change in recall rate from overall recall when classifier output was in lowest tercile

median_classifier_output

Median of the classifier outputs

mid_tercile_diff_from_mean

% change in recall rate from overall recall when classifier output was in middle tercile

permuted_auc_values

Array of AUC values from performing permutation test

populate(subject, experiment, session, true_outcomes, predicted_probabilities, permuted_auc_values, frequencies, pairs, features, coefficients, tag='', reloaded=False)[source]

Populate classifier performance metrics

Parameters:
  • subject (string) – Subject identifier
  • experiment (string) – Name of the experiment
  • session (string) – Session number
  • true_outcomes (array_like) – Boolean array for if a word was recalled or not
  • predicted_probabilities (array_like) – Outputs from the trained classifier for each word event
  • permuted_auc_values (array_like) – AUC values from performing a permutation test on classifier
  • frequencies (array_like) – Frequencies used to train the classifier
  • pairs (pd.DataFrame) – Metadata for each bipolar pair recorded from
  • features (np.ndarray) – Feature matrix used to train the classifier, of shape [len(predicted_probabilities) , (len(pairs) * len(frequencies)].
  • coefficients (np.array) – Array of classifier weights
  • tag (str) – Name given to the classifier, used to differentiate between multiple classifiers
  • reloaded (bool) – Indicates whether the classifier is reloaded from hard disk, i.e. is the actually classifier used. If false, then the classifier was created from scratch
predicted_probabilities

Classifier output for each word encoding event

pvalue

p-value of classifier AUC based on permuted AUCs

regions

List of unique electrode regions

thresholds

Thresholds used for AUC curve

true_outcomes

Behavioral response (recalled/not-recalled) to each word encoding event

true_positive_rate

True positive rate used for AUC curve

class ramutils.reports.summary.MathSummary(**kwargs)[source]

Summarizes data from math distractor periods. Input events must either be all events (which include math events) or just math events.

events

For Math events, returns original events after excluding practice lists

num_correct

Returns the number of problems solved correctly.

num_lists

Number of lists at least partially completed in the session

num_problems

Returns the total number of problems solved by the subject.

percent_correct

Returns the percentage of problems solved correctly.

populate(events)[source]

Populate the summary object with the given events

problems_per_list

Returns the mean number of problems per list.

session_number

Session number

to_dataframe(recreate=False)[source]

Convert the summary to a pd.DataFrame for easier manipulation. This amounts to converting the events to a dataframe

Keyword Arguments:
 recreate (bool) – Force re-creating the dataframe. Otherwise, it will only be created the first time this method is called and stored as an instance attribute.
Returns:
Return type:pd.DataFrame
static total_num_correct(summaries)[source]

Get the total number of correctly answered problems for multiple sessions.

Parameters:summaries (List[MathSummary]) –
Returns:
Return type:int
static total_num_problems(summaries)[source]

Get total number of problems for multiple sessions.

Parameters:summaries (List[MathSummary]) –
Returns:
Return type:int
static total_percent_correct(summaries)[source]

Get the percent correct problems for multiple sessions.

Parameters:summaries (List[MathSummary]) –
Returns:
Return type:float
static total_problems_per_list(summaries)[source]

Get the mean number of problems per list for multiple sessions.

Parameters:summaries (List[MathSummary]) –
Returns:
Return type:float
class ramutils.reports.summary.Summary(**kwargs)[source]

Base class for all session summary objects

classmethod create(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]

Create a new summary object from events

Parameters:
  • events (np.recarray) –
  • raw_events (np.recarray) –
  • bipolar_pairs (dict) – Dictionary containing data in bipolar pairs in a montage
  • excluded_pairs (dict) – Dictionary containing data on pairs excluded from analysis
  • normalized_powers (np.ndarray) – 2D array of normalzied powers of shape n_events x ( n_frequencies * n_bipolar_pairs)
events

Numpy recarray of task events, i.e. the events used to train a classifier

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]

Abstract method to be overriden by child classes

raw_events

np.rec.array of all events (math and task) from the session

class ramutils.reports.summary.SessionSummary(**kwargs)[source]

Base class for single-session objects.

bipolar_pairs

Returns a dictionary of bipolar pairs

events

np.recarray of events

excluded_pairs

Returns a dictionary of bipolar pairs to be excluded in classifier training

experiment

Experiment name

n_pairs

Returns the number of bipolar pairs in the recording

normalized_powers

Powers normalized to 0 mean and unit variance

normalized_powers_plot

Plots the matrix of normalized powers for the session to the specified filename or file-like object, and returns the plot as a base64-encoded string

num_lists

Number of lists completed in the session

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]

Populate attributes and store events.

session_datetime

Returns a timezone-aware datetime object of the end time of the session in UTC.

session_length

Computes the total amount of time the session lasted in seconds.

session_number

Session number

subject

Subject ID associated with the session

to_dataframe(recreate=False)[source]

Convert the summary to a pd.DataFrame for easier manipulation. This amounts to converting the events to a dataframe

Keyword Arguments:
 recreate (bool) – Force re-creating the dataframe. Otherwise, it will only be created the first time this method is called and stored as an instance attribute.
Returns:
Return type:pd.DataFrame
class ramutils.reports.summary.FRSessionSummary(**kwargs)[source]

Free recall session summary data.

intrusion_events

Recall events that were either extra-list or prior-list intrusions

num_correct

Number of correctly-recalled words

num_extra_list_intrusions

Calculates the number of extra-list intrusions

num_lists

Returns the total number of lists.

num_prior_list_intrusions

Calculates the number of prior list intrusions

num_words

Number of words in the session

percent_recalled

Calculates the percentage correctly recalled words.

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]

Populate data from events.

Parameters:
  • events (np.recarray) –
  • raw_events (np.recarray) –
  • recall_probs (np.ndarray) – Predicted probabilities of recall per item. If not given, assumed there is no relevant classifier and values of -999 are used to indicate this.
static serialpos_probabilities(summaries, first=False)[source]

Computes the mean recall probability by word serial position.

Parameters:
  • summaries (List[Summary]) – Summaries of sessions.
  • first (bool) – When True, return probabilities that each serial position is the first recalled word. Otherwise, return the probability of recall for each word by serial position.
Returns:

Return type:

List[float]

class ramutils.reports.summary.CatFRSessionSummary(**kwargs)[source]

Extends standard FR session summaries for categorized free recall experiments.

irt_between_category

Between category item response time

irt_within_category

Within-category item response time

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None, repetition_ratio_dict={})[source]

Populates the CatFRSessionSummary object

raw_repetition_ratios

Dictionary where keys are subject identifiers for subjects completing at least one CatFR session and values are the repetition ratio for that subject by list

repetition_ratios

Dictionary where keys are subject identifiers for subjects completing at least one CatFR session and values are the repetition ratio for that subject averaged over the session

subject_ratio

Repetition ratio for the current subject

class ramutils.reports.summary.StimSessionSummary(**kwargs)[source]

SessionSummary data specific to sessions with stimulation.

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, post_stim_prob_recall=None, raw_events=None, model_metadata={}, post_stim_eeg=None, stim_tstats=None)[source]

Populate stim data from events

post_stim_prob_recall

Classifier output in the post-stim period

subject

Subject ID associated with the session

class ramutils.reports.summary.FRStimSessionSummary(**kwargs)[source]

SessionSummary for FR sessions with stim

static combine_sessions(summaries)[source]

Combine information from multiple stim sessions

static delta_recall(summaries, post_stim_items=False)[source]

%change in item recall for stimulated items versus non-stimulated low biomarker items. Optionally return the same comparison, but for post-stim items

static lists(summaries, stim=None)[source]

Get a list of either stim lists or non-stim lists

static num_nonstim_lists(summaries)[source]

Returns the number of non-stim lists.

static num_stim_lists(summaries)[source]

Returns the number of stim lists.

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, post_stim_prob_recall=None, raw_events=None, model_metadata={}, post_stim_eeg=None, stim_tstats=None)[source]

Populate data from events.

Parameters:
  • events (np.recarray) –
  • raw_events (np.recarray) –
  • recall_probs (np.ndarray) – Predicted probabilities of recall per item. If not given, assumed there is no relevant classifier and values of -999 are used to indicate this.
static pre_stim_prob_recall(summaries, phase=None)[source]

Classifier output in the pre-stim period for items that were eventually stimulated

static prob_first_recall_by_serialpos(summaries, stim=False)[source]

Probability of recalling a word first by serial position. Optionally returns results for only stim items

static prob_recall_by_serialpos(summaries, stim_items_only=False)[source]

Probability of recall by serial position. Optionally returns results for only stim items

static prob_stim_by_serialpos(summaries)[source]

Array containing the probability of stimulation (mean of the classifier output) by serial position

static recall_test_results(summaries, experiment)[source]

Returns a dictionary containing the results of chi-squared tests for the behavioral effects of stimulation. Comparisons include stim lists vs. non-stim lists, stim items vs. low-biomarker non-stim items, and post-stim items vers. low-biomarker non-stim items. All comparisons are done for each unique set of stimulation parameters

static recalls_by_list(summaries, stim_list_only=False)[source]

Number of recalls by list. Optionally returns results for only stim lists

stim_columns

Fields associated with stimulation parameters

static stim_events_by_list(summaries)[source]

Array containing the number of stim events by list

static stim_parameters(summaries)[source]

Returns a list of unique stimulation parameters used during the experiment

static stim_params_by_list(summaries)[source]

Returns a dataframe of stimulation parameters used within each session/list

class ramutils.reports.summary.PSSessionSummary(**kwargs)[source]

Parameter Search experiment summary

decision

Return a dictionary containing decision information from the Bayesian optimization algorithm

location_summary

Return a dictionary whose keys are the locations stimulated in the experiment and values are a dictionary containing additional metadata about the results from stimulating at that location

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]

Populate attributes and store events.