Serializable data structures¶

Defining data classes¶

Data can be defined in a serializable manner using the traitschema.Schema base class which adds serialization methods to data classes that are defined using the traits package. To ensure serializability, use the Array type whenever possible.

Experiment parameters¶

Experiment parameters (e.g., timing windows) are defined as traitschema.Schema subclasses so that the parameters used when training a classifier can be easily saved.

Common experimental/model parameters.

class ramutils.parameters.FilePaths(**kwargs)[source]¶

Paths to files that frequently get passed around to many tasks.

All paths given relative to the root path but are converted to absolute paths on creation.

Keyword Arguments:

root (str) – Rhino mount point.
dest (str) – Directory to write files to.
pairs (str) – Path to pairs.json.
excluded_pairs (str) – Path to excluded_pairs.json.
electrode_config_file (str) – Path to electrode config file.
area_file (str) – Path to surface area file. When generating Odin configuration files and not defined, the default behavior is to look in the same directory as the jacksheet for a file named area.txt.
data_db (str) – Path to directory where permanently-cached underlying data for reports should be stored. In general, this should only be specified when testing, otherwise the default location should be used

class ramutils.parameters.ExperimentParameters(**kwargs)[source]¶: Common parameters used in an experiment. Default values apply to the FR class of experiments.

class ramutils.parameters.FRParameters(**kwargs)[source]¶: Free recall experiment parameters relevant for classification.

class ramutils.parameters.PALParameters(**kwargs)[source]¶: Paired associates experiment parameters relevant for classification. It inhertis all of the same parameters as FR experiments and adds a few more

class ramutils.parameters.PS5Parameters(**kwargs)[source]¶: PS5 experiment parameters

class ramutils.parameters.StimParameters(**kwargs)[source]¶: Single-channel stimulation parameters.

Underlying Data¶

All data necessary to rebuild a report is saved in a binary format as part of generating the report. All data is dumped into a single directory with differentiation between subjects/sessions/data done by following a strict naming convention: {subject}_{experiment}_{session}_{data_type}.{file_type}. Most saved objects are unique to a particular subject/experiment/session. In cases where this is not true, {session} wil be an underscore-separated list of the sessions used to generate the data. To see how this is done, see ramutils.tasks.misc.save_all_output() and ramutils.tasks.misc.load_existing_results(). For example, if a target selection table was generated using sessions 1, 2, and 3 for subject R1XXX and experiment XYZ, then the file would be saved as R1XXX_XYZ_1_2_3_target_selection_table.csv. Listed below are the types of data stored. Their corresponding objects are also noted. The properties and methods defined for each of these objects can be found in the documentation below.

target_selection_table – A csv file containing metadata for each electrode
classifier_summary – Metadata related to classifier performance ClassifierSummary
math_summary – Math events and useful helper methods for assessing performance on the distractor task MathSummary
session_summary – Events and helper methods for conducting behavioral analyses and generating plots. In many cases, there are summary objects specific to the type of session, i.e. stim vs. nonstim, FR vs. CatFR vs. PS, etc.

class ramutils.reports.summary.ClassifierSummary(**kwargs)[source]¶

Classifier Summary Object

auc¶: Classifier AUC

classifier_activation¶: Forward model of classifier activation from Haufe et. al. 2014

confidence_interval_median_classifier_output¶: 95% confidence interval for the median of the classifier output. Used as a sniff test for if something is amiss. Should be centered around 0.5

false_positive_rate¶: False positive rate used for AUC curve

high_tercile_diff_from_mean¶: % change in recall rate from overall recall when classifier output was in highest tercile

low_tercile_diff_from_mean¶: % change in recall rate from overall recall when classifier output was in lowest tercile

median_classifier_output¶: Median of the classifier outputs

mid_tercile_diff_from_mean¶: % change in recall rate from overall recall when classifier output was in middle tercile

permuted_auc_values¶: Array of AUC values from performing permutation test

populate(subject, experiment, session, true_outcomes, predicted_probabilities, permuted_auc_values, frequencies, pairs, features, coefficients, tag='', reloaded=False)[source]¶

Populate classifier performance metrics

Parameters:

subject (string) – Subject identifier
experiment (string) – Name of the experiment
session (string) – Session number
true_outcomes (array_like) – Boolean array for if a word was recalled or not
predicted_probabilities (array_like) – Outputs from the trained classifier for each word event
permuted_auc_values (array_like) – AUC values from performing a permutation test on classifier
frequencies (array_like) – Frequencies used to train the classifier
pairs (pd.DataFrame) – Metadata for each bipolar pair recorded from
features (np.ndarray) – Feature matrix used to train the classifier, of shape [len(predicted_probabilities) , (len(pairs) * len(frequencies)].
coefficients (np.array) – Array of classifier weights
tag (str) – Name given to the classifier, used to differentiate between multiple classifiers
reloaded (bool) – Indicates whether the classifier is reloaded from hard disk, i.e. is the actually classifier used. If false, then the classifier was created from scratch

predicted_probabilities¶: Classifier output for each word encoding event

pvalue¶: p-value of classifier AUC based on permuted AUCs

regions¶: List of unique electrode regions

thresholds¶: Thresholds used for AUC curve

true_outcomes¶: Behavioral response (recalled/not-recalled) to each word encoding event

true_positive_rate¶: True positive rate used for AUC curve

class ramutils.reports.summary.MathSummary(**kwargs)[source]¶

Summarizes data from math distractor periods. Input events must either be all events (which include math events) or just math events.

events¶: For Math events, returns original events after excluding practice lists

num_correct¶: Returns the number of problems solved correctly.

num_lists¶: Number of lists at least partially completed in the session

num_problems¶: Returns the total number of problems solved by the subject.

percent_correct¶: Returns the percentage of problems solved correctly.

populate(events)[source]¶: Populate the summary object with the given events

problems_per_list¶: Returns the mean number of problems per list.

session_number¶: Session number

to_dataframe(recreate=False)[source]¶

Convert the summary to a pd.DataFrame for easier manipulation. This amounts to converting the events to a dataframe

Keyword Arguments:
	recreate (bool) – Force re-creating the dataframe. Otherwise, it will only be created the first time this method is called and stored as an instance attribute.
Returns:
Return type:	pd.DataFrame

static total_num_correct(summaries)[source]¶

Get the total number of correctly answered problems for multiple sessions.

Parameters:	summaries (List[MathSummary]) –
Returns:
Return type:	int

static total_num_problems(summaries)[source]¶

Get total number of problems for multiple sessions.

Parameters:	summaries (List[MathSummary]) –
Returns:
Return type:	int

static total_percent_correct(summaries)[source]¶

Get the percent correct problems for multiple sessions.

Parameters:	summaries (List[MathSummary]) –
Returns:
Return type:	float

static total_problems_per_list(summaries)[source]¶

Get the mean number of problems per list for multiple sessions.

Parameters:	summaries (List[MathSummary]) –
Returns:
Return type:	float

class ramutils.reports.summary.Summary(**kwargs)[source]¶

Base class for all session summary objects

classmethod create(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]¶

Create a new summary object from events

Parameters:	events (`np.recarray`) – raw_events (`np.recarray`) – bipolar_pairs (dict) – Dictionary containing data in bipolar pairs in a montage excluded_pairs (dict) – Dictionary containing data on pairs excluded from analysis normalized_powers (`np.ndarray`) – 2D array of normalzied powers of shape n_events x ( n_frequencies * n_bipolar_pairs)

events¶: Numpy recarray of task events, i.e. the events used to train a classifier

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]¶: Abstract method to be overriden by child classes

raw_events¶: np.rec.array of all events (math and task) from the session

class ramutils.reports.summary.SessionSummary(**kwargs)[source]¶

Base class for single-session objects.

bipolar_pairs¶: Returns a dictionary of bipolar pairs

events¶: np.recarray of events

excluded_pairs¶: Returns a dictionary of bipolar pairs to be excluded in classifier training

experiment¶: Experiment name

n_pairs¶: Returns the number of bipolar pairs in the recording

normalized_powers¶: Powers normalized to 0 mean and unit variance

normalized_powers_plot¶: Plots the matrix of normalized powers for the session to the specified filename or file-like object, and returns the plot as a base64-encoded string

num_lists¶: Number of lists completed in the session

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]¶: Populate attributes and store events.

session_datetime¶: Returns a timezone-aware datetime object of the end time of the session in UTC.

session_length¶: Computes the total amount of time the session lasted in seconds.

session_number¶: Session number

subject¶: Subject ID associated with the session

to_dataframe(recreate=False)[source]¶

Convert the summary to a pd.DataFrame for easier manipulation. This amounts to converting the events to a dataframe

Keyword Arguments:
	recreate (bool) – Force re-creating the dataframe. Otherwise, it will only be created the first time this method is called and stored as an instance attribute.
Returns:
Return type:	pd.DataFrame

class ramutils.reports.summary.FRSessionSummary(**kwargs)[source]¶

Free recall session summary data.

intrusion_events¶: Recall events that were either extra-list or prior-list intrusions

num_correct¶: Number of correctly-recalled words

num_extra_list_intrusions¶: Calculates the number of extra-list intrusions

num_lists¶: Returns the total number of lists.

num_prior_list_intrusions¶: Calculates the number of prior list intrusions

num_words¶: Number of words in the session

percent_recalled¶: Calculates the percentage correctly recalled words.

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]¶

Populate data from events.

Parameters:	events (np.recarray) – raw_events (np.recarray) – recall_probs (np.ndarray) – Predicted probabilities of recall per item. If not given, assumed there is no relevant classifier and values of -999 are used to indicate this.

static serialpos_probabilities(summaries, first=False)[source]¶

Computes the mean recall probability by word serial position.

Parameters:	summaries (List[Summary]) – Summaries of sessions. first (bool) – When True, return probabilities that each serial position is the first recalled word. Otherwise, return the probability of recall for each word by serial position.
Returns:
Return type:	List[float]

class ramutils.reports.summary.CatFRSessionSummary(**kwargs)[source]¶

Extends standard FR session summaries for categorized free recall experiments.

irt_between_category¶: Between category item response time

irt_within_category¶: Within-category item response time

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None, repetition_ratio_dict={})[source]¶: Populates the CatFRSessionSummary object

raw_repetition_ratios¶: Dictionary where keys are subject identifiers for subjects completing at least one CatFR session and values are the repetition ratio for that subject by list

repetition_ratios¶: Dictionary where keys are subject identifiers for subjects completing at least one CatFR session and values are the repetition ratio for that subject averaged over the session

subject_ratio¶: Repetition ratio for the current subject

class ramutils.reports.summary.StimSessionSummary(**kwargs)[source]¶

SessionSummary data specific to sessions with stimulation.

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, post_stim_prob_recall=None, raw_events=None, model_metadata={}, post_stim_eeg=None, stim_tstats=None)[source]¶: Populate stim data from events

post_stim_prob_recall¶: Classifier output in the post-stim period

subject¶: Subject ID associated with the session

class ramutils.reports.summary.FRStimSessionSummary(**kwargs)[source]¶

SessionSummary for FR sessions with stim

static combine_sessions(summaries)[source]¶: Combine information from multiple stim sessions

static delta_recall(summaries, post_stim_items=False)[source]¶: %change in item recall for stimulated items versus non-stimulated low biomarker items. Optionally return the same comparison, but for post-stim items

static lists(summaries, stim=None)[source]¶: Get a list of either stim lists or non-stim lists

static num_nonstim_lists(summaries)[source]¶: Returns the number of non-stim lists.

static num_stim_lists(summaries)[source]¶: Returns the number of stim lists.

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, post_stim_prob_recall=None, raw_events=None, model_metadata={}, post_stim_eeg=None, stim_tstats=None)[source]¶

Populate data from events.

Parameters:	events (np.recarray) – raw_events (np.recarray) – recall_probs (np.ndarray) – Predicted probabilities of recall per item. If not given, assumed there is no relevant classifier and values of -999 are used to indicate this.

static pre_stim_prob_recall(summaries, phase=None)[source]¶: Classifier output in the pre-stim period for items that were eventually stimulated

static prob_first_recall_by_serialpos(summaries, stim=False)[source]¶: Probability of recalling a word first by serial position. Optionally returns results for only stim items

static prob_recall_by_serialpos(summaries, stim_items_only=False)[source]¶: Probability of recall by serial position. Optionally returns results for only stim items

static prob_stim_by_serialpos(summaries)[source]¶: Array containing the probability of stimulation (mean of the classifier output) by serial position

static recall_test_results(summaries, experiment)[source]¶: Returns a dictionary containing the results of chi-squared tests for the behavioral effects of stimulation. Comparisons include stim lists vs. non-stim lists, stim items vs. low-biomarker non-stim items, and post-stim items vers. low-biomarker non-stim items. All comparisons are done for each unique set of stimulation parameters

static recalls_by_list(summaries, stim_list_only=False)[source]¶: Number of recalls by list. Optionally returns results for only stim lists

stim_columns¶: Fields associated with stimulation parameters

static stim_events_by_list(summaries)[source]¶: Array containing the number of stim events by list

static stim_parameters(summaries)[source]¶: Returns a list of unique stimulation parameters used during the experiment

static stim_params_by_list(summaries)[source]¶: Returns a dataframe of stimulation parameters used within each session/list

class ramutils.reports.summary.PSSessionSummary(**kwargs)[source]¶

Parameter Search experiment summary

decision¶: Return a dictionary containing decision information from the Bayesian optimization algorithm

location_summary¶: Return a dictionary whose keys are the locations stimulated in the experiment and values are a dictionary containing additional metadata about the results from stimulating at that location

populate(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]¶: Populate attributes and store events.