Serializable data structures¶
Defining data classes¶
Data can be defined in a serializable manner using the
traitschema.Schema
base class which adds serialization methods to
data classes that are defined using the traits
package. To ensure
serializability, use the Array
type whenever possible.
Experiment parameters¶
Experiment parameters (e.g., timing windows) are defined as
traitschema.Schema
subclasses so that the parameters used when
training a classifier can be easily saved.
Common experimental/model parameters.
-
class
ramutils.parameters.
FilePaths
(**kwargs)[source]¶ Paths to files that frequently get passed around to many tasks.
All paths given relative to the root path but are converted to absolute paths on creation.
Keyword Arguments: - root (str) – Rhino mount point.
- dest (str) – Directory to write files to.
- pairs (str) – Path to
pairs.json
. - excluded_pairs (str) – Path to
excluded_pairs.json
. - electrode_config_file (str) – Path to electrode config file.
- area_file (str) – Path to surface area file. When generating Odin configuration files and
not defined, the default behavior is to look in the same directory as
the jacksheet for a file named
area.txt
. - data_db (str) – Path to directory where permanently-cached underlying data for reports should be stored. In general, this should only be specified when testing, otherwise the default location should be used
-
class
ramutils.parameters.
ExperimentParameters
(**kwargs)[source]¶ Common parameters used in an experiment. Default values apply to the FR class of experiments.
-
class
ramutils.parameters.
FRParameters
(**kwargs)[source]¶ Free recall experiment parameters relevant for classification.
Underlying Data¶
All data necessary to rebuild a report is saved in a binary format as part of generating the report. All data is dumped
into a single directory with differentiation between subjects/sessions/data done by following a strict naming convention:
{subject}_{experiment}_{session}_{data_type}.{file_type}. Most saved objects are unique to a particular subject/experiment/session.
In cases where this is not true, {session} wil be an underscore-separated list of the sessions used to generate the data.
To see how this is done, see ramutils.tasks.misc.save_all_output()
and ramutils.tasks.misc.load_existing_results()
.
For example, if a target selection table was generated using sessions 1, 2, and 3 for subject R1XXX and experiment
XYZ, then the file would be saved as R1XXX_XYZ_1_2_3_target_selection_table.csv. Listed below are the types of data stored. Their
corresponding objects are also noted. The properties and methods defined for each of these objects can be found in the
documentation below.
- target_selection_table – A csv file containing metadata for each electrode
- classifier_summary – Metadata related to classifier performance
ClassifierSummary
- math_summary – Math events and useful helper methods for assessing performance on the distractor task
MathSummary
- session_summary – Events and helper methods for conducting behavioral analyses and generating plots. In many cases, there are summary objects specific to the type of session, i.e. stim vs. nonstim, FR vs. CatFR vs. PS, etc.
-
class
ramutils.reports.summary.
ClassifierSummary
(**kwargs)[source]¶ Classifier Summary Object
-
auc
¶ Classifier AUC
-
classifier_activation
¶ Forward model of classifier activation from Haufe et. al. 2014
-
confidence_interval_median_classifier_output
¶ 95% confidence interval for the median of the classifier output. Used as a sniff test for if something is amiss. Should be centered around 0.5
-
false_positive_rate
¶ False positive rate used for AUC curve
-
high_tercile_diff_from_mean
¶ % change in recall rate from overall recall when classifier output was in highest tercile
-
low_tercile_diff_from_mean
¶ % change in recall rate from overall recall when classifier output was in lowest tercile
-
median_classifier_output
¶ Median of the classifier outputs
-
mid_tercile_diff_from_mean
¶ % change in recall rate from overall recall when classifier output was in middle tercile
-
permuted_auc_values
¶ Array of AUC values from performing permutation test
-
populate
(subject, experiment, session, true_outcomes, predicted_probabilities, permuted_auc_values, frequencies, pairs, features, coefficients, tag='', reloaded=False)[source]¶ Populate classifier performance metrics
Parameters: - subject (string) – Subject identifier
- experiment (string) – Name of the experiment
- session (string) – Session number
- true_outcomes (array_like) – Boolean array for if a word was recalled or not
- predicted_probabilities (array_like) – Outputs from the trained classifier for each word event
- permuted_auc_values (array_like) – AUC values from performing a permutation test on classifier
- frequencies (array_like) – Frequencies used to train the classifier
- pairs (pd.DataFrame) – Metadata for each bipolar pair recorded from
- features (np.ndarray) – Feature matrix used to train the classifier, of shape [len(predicted_probabilities) , (len(pairs) * len(frequencies)].
- coefficients (np.array) – Array of classifier weights
- tag (str) – Name given to the classifier, used to differentiate between multiple classifiers
- reloaded (bool) – Indicates whether the classifier is reloaded from hard disk, i.e. is the actually classifier used. If false, then the classifier was created from scratch
-
predicted_probabilities
¶ Classifier output for each word encoding event
-
pvalue
¶ p-value of classifier AUC based on permuted AUCs
-
regions
¶ List of unique electrode regions
-
thresholds
¶ Thresholds used for AUC curve
-
true_outcomes
¶ Behavioral response (recalled/not-recalled) to each word encoding event
-
true_positive_rate
¶ True positive rate used for AUC curve
-
-
class
ramutils.reports.summary.
MathSummary
(**kwargs)[source]¶ Summarizes data from math distractor periods. Input events must either be all events (which include math events) or just math events.
-
events
¶ For Math events, returns original events after excluding practice lists
-
num_correct
¶ Returns the number of problems solved correctly.
-
num_lists
¶ Number of lists at least partially completed in the session
-
num_problems
¶ Returns the total number of problems solved by the subject.
-
percent_correct
¶ Returns the percentage of problems solved correctly.
-
problems_per_list
¶ Returns the mean number of problems per list.
-
session_number
¶ Session number
-
to_dataframe
(recreate=False)[source]¶ Convert the summary to a
pd.DataFrame
for easier manipulation. This amounts to converting the events to a dataframeKeyword Arguments: recreate (bool) – Force re-creating the dataframe. Otherwise, it will only be created the first time this method is called and stored as an instance attribute. Returns: Return type: pd.DataFrame
-
static
total_num_correct
(summaries)[source]¶ Get the total number of correctly answered problems for multiple sessions.
Parameters: summaries (List[MathSummary]) – Returns: Return type: int
-
static
total_num_problems
(summaries)[source]¶ Get total number of problems for multiple sessions.
Parameters: summaries (List[MathSummary]) – Returns: Return type: int
-
static
total_percent_correct
(summaries)[source]¶ Get the percent correct problems for multiple sessions.
Parameters: summaries (List[MathSummary]) – Returns: Return type: float
-
static
total_problems_per_list
(summaries)[source]¶ Get the mean number of problems per list for multiple sessions.
Parameters: summaries (List[MathSummary]) – Returns: Return type: float
-
-
class
ramutils.reports.summary.
Summary
(**kwargs)[source]¶ Base class for all session summary objects
-
classmethod
create
(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]¶ Create a new summary object from events
Parameters: - events (
np.recarray
) – - raw_events (
np.recarray
) – - bipolar_pairs (dict) – Dictionary containing data in bipolar pairs in a montage
- excluded_pairs (dict) – Dictionary containing data on pairs excluded from analysis
- normalized_powers (
np.ndarray
) – 2D array of normalzied powers of shape n_events x ( n_frequencies * n_bipolar_pairs)
- events (
-
events
¶ Numpy recarray of task events, i.e. the events used to train a classifier
-
populate
(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]¶ Abstract method to be overriden by child classes
-
raw_events
¶ np.rec.array
of all events (math and task) from the session
-
classmethod
-
class
ramutils.reports.summary.
SessionSummary
(**kwargs)[source]¶ Base class for single-session objects.
-
bipolar_pairs
¶ Returns a dictionary of bipolar pairs
-
events
¶ np.recarray
of events
-
excluded_pairs
¶ Returns a dictionary of bipolar pairs to be excluded in classifier training
-
experiment
¶ Experiment name
-
n_pairs
¶ Returns the number of bipolar pairs in the recording
-
normalized_powers
¶ Powers normalized to 0 mean and unit variance
-
normalized_powers_plot
¶ Plots the matrix of normalized powers for the session to the specified filename or file-like object, and returns the plot as a base64-encoded string
-
num_lists
¶ Number of lists completed in the session
-
populate
(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]¶ Populate attributes and store events.
-
session_datetime
¶ Returns a timezone-aware datetime object of the end time of the session in UTC.
-
session_length
¶ Computes the total amount of time the session lasted in seconds.
-
session_number
¶ Session number
-
subject
¶ Subject ID associated with the session
-
to_dataframe
(recreate=False)[source]¶ Convert the summary to a
pd.DataFrame
for easier manipulation. This amounts to converting the events to a dataframeKeyword Arguments: recreate (bool) – Force re-creating the dataframe. Otherwise, it will only be created the first time this method is called and stored as an instance attribute. Returns: Return type: pd.DataFrame
-
-
class
ramutils.reports.summary.
FRSessionSummary
(**kwargs)[source]¶ Free recall session summary data.
-
intrusion_events
¶ Recall events that were either extra-list or prior-list intrusions
-
num_correct
¶ Number of correctly-recalled words
-
num_extra_list_intrusions
¶ Calculates the number of extra-list intrusions
-
num_lists
¶ Returns the total number of lists.
-
num_prior_list_intrusions
¶ Calculates the number of prior list intrusions
-
num_words
¶ Number of words in the session
-
percent_recalled
¶ Calculates the percentage correctly recalled words.
-
populate
(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None)[source]¶ Populate data from events.
Parameters: - events (np.recarray) –
- raw_events (np.recarray) –
- recall_probs (np.ndarray) – Predicted probabilities of recall per item. If not given, assumed there is no relevant classifier and values of -999 are used to indicate this.
-
static
serialpos_probabilities
(summaries, first=False)[source]¶ Computes the mean recall probability by word serial position.
Parameters: - summaries (List[Summary]) – Summaries of sessions.
- first (bool) – When True, return probabilities that each serial position is the first recalled word. Otherwise, return the probability of recall for each word by serial position.
Returns: Return type: List[float]
-
-
class
ramutils.reports.summary.
CatFRSessionSummary
(**kwargs)[source]¶ Extends standard FR session summaries for categorized free recall experiments.
-
irt_between_category
¶ Between category item response time
-
irt_within_category
¶ Within-category item response time
-
populate
(events, bipolar_pairs, excluded_pairs, normalized_powers, raw_events=None, repetition_ratio_dict={})[source]¶ Populates the CatFRSessionSummary object
-
raw_repetition_ratios
¶ Dictionary where keys are subject identifiers for subjects completing at least one CatFR session and values are the repetition ratio for that subject by list
-
repetition_ratios
¶ Dictionary where keys are subject identifiers for subjects completing at least one CatFR session and values are the repetition ratio for that subject averaged over the session
-
subject_ratio
¶ Repetition ratio for the current subject
-
-
class
ramutils.reports.summary.
StimSessionSummary
(**kwargs)[source]¶ SessionSummary data specific to sessions with stimulation.
-
populate
(events, bipolar_pairs, excluded_pairs, normalized_powers, post_stim_prob_recall=None, raw_events=None, model_metadata={}, post_stim_eeg=None, stim_tstats=None)[source]¶ Populate stim data from events
-
post_stim_prob_recall
¶ Classifier output in the post-stim period
-
subject
¶ Subject ID associated with the session
-
-
class
ramutils.reports.summary.
FRStimSessionSummary
(**kwargs)[source]¶ SessionSummary for FR sessions with stim
-
static
delta_recall
(summaries, post_stim_items=False)[source]¶ %change in item recall for stimulated items versus non-stimulated low biomarker items. Optionally return the same comparison, but for post-stim items
-
populate
(events, bipolar_pairs, excluded_pairs, normalized_powers, post_stim_prob_recall=None, raw_events=None, model_metadata={}, post_stim_eeg=None, stim_tstats=None)[source]¶ Populate data from events.
Parameters: - events (np.recarray) –
- raw_events (np.recarray) –
- recall_probs (np.ndarray) – Predicted probabilities of recall per item. If not given, assumed there is no relevant classifier and values of -999 are used to indicate this.
-
static
pre_stim_prob_recall
(summaries, phase=None)[source]¶ Classifier output in the pre-stim period for items that were eventually stimulated
-
static
prob_first_recall_by_serialpos
(summaries, stim=False)[source]¶ Probability of recalling a word first by serial position. Optionally returns results for only stim items
-
static
prob_recall_by_serialpos
(summaries, stim_items_only=False)[source]¶ Probability of recall by serial position. Optionally returns results for only stim items
-
static
prob_stim_by_serialpos
(summaries)[source]¶ Array containing the probability of stimulation (mean of the classifier output) by serial position
-
static
recall_test_results
(summaries, experiment)[source]¶ Returns a dictionary containing the results of chi-squared tests for the behavioral effects of stimulation. Comparisons include stim lists vs. non-stim lists, stim items vs. low-biomarker non-stim items, and post-stim items vers. low-biomarker non-stim items. All comparisons are done for each unique set of stimulation parameters
-
static
recalls_by_list
(summaries, stim_list_only=False)[source]¶ Number of recalls by list. Optionally returns results for only stim lists
-
stim_columns
¶ Fields associated with stimulation parameters
-
static
-
class
ramutils.reports.summary.
PSSessionSummary
(**kwargs)[source]¶ Parameter Search experiment summary
-
decision
¶ Return a dictionary containing decision information from the Bayesian optimization algorithm
-
location_summary
¶ Return a dictionary whose keys are the locations stimulated in the experiment and values are a dictionary containing additional metadata about the results from stimulating at that location
-