{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting Started with CML Readers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "import pandas as pd\n",
    "import cmlreaders as cml"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Finding Files on Rhino\n",
    "\n",
    "The PathFinder helper class can be used to locate files on RHINO. It's sole responsibility is to locate and return the file path of the file. In many cases, a file could be located in more than one location. In these situations, PathFinder will search over the list of possible locations and return the path where the file is first found. Implicitly, this assumes that the order of the file locations is prioritized such that the preferred location comes before a fall-back location. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If not working on RHINO, specify the mount point.\n",
    "# Alternatively, set the CML_ROOT environment variable and never\n",
    "# have to explicitly pass the rootdir keyword argument.\n",
    "rhino_root = \"/mnt/rhino/\"\n",
    "\n",
    "# Instantiate the finder object\n",
    "finder = cml.PathFinder(subject=\"R1389J\", experiment=\"catFR5\", session=1, \n",
    "                        localization=0, montage=0, rootdir=rhino_root)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What can you request?\n",
    "\n",
    "The PathFinder has a few built-in properties to help you understand what data types are currently supported. Different file types require that the finder be instantiated with different fields. For example, if you are planning to request localization files, there is no need to specify an experiment, session, or montage. However, it is not a problem to specify too many fields, as any extraneous ones will simply be ignored if the data type does not require that it be given. The following properties are defined:\n",
    "\n",
    "- requestable_files: All supported data types\n",
    "- localization_files: Files related to localization\n",
    "- montage_files: Files associated with a specific montage\n",
    "- session_files: Files that are specific to a session. This files could be processed events, Ramulator files, etc.\n",
    "\n",
    "For high-level information about each of these data types, see the [Data Guide](https://pennmem.github.io/cmlreaders/html/data_guide.html) section of the documentation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['r1_index',\n",
       " 'ltp_index',\n",
       " 'pyfr_index',\n",
       " 'pyfr_root',\n",
       " 'localization',\n",
       " 'voxel_coordinates',\n",
       " 'prior_stim_results',\n",
       " 'electrode_coordinates',\n",
       " 'jacksheet',\n",
       " 'area',\n",
       " 'electrode_categories',\n",
       " 'good_leads',\n",
       " 'leads',\n",
       " 'classifier_excluded_leads',\n",
       " 'matlab_bipolar_talstruct',\n",
       " 'matlab_monopolar_talstruct',\n",
       " 'pairs',\n",
       " 'contacts',\n",
       " 'session_summary',\n",
       " 'classifier_summary',\n",
       " 'math_summary',\n",
       " 'target_selection_table',\n",
       " 'baseline_classifier',\n",
       " 'all_events',\n",
       " 'task_events',\n",
       " 'math_events',\n",
       " 'ps4_events',\n",
       " 'sources',\n",
       " 'processed_eeg',\n",
       " 'experiment_log',\n",
       " 'session_log',\n",
       " 'ramulator_session_folder',\n",
       " 'event_log',\n",
       " 'experiment_config',\n",
       " 'raw_eeg',\n",
       " 'odin_config',\n",
       " 'used_classifier',\n",
       " 'excluded_pairs',\n",
       " 'all_pairs']"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "finder.requestable_files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('localization',)"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "finder.localization_files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('pairs',\n",
       " 'contacts',\n",
       " 'voxel_coordinates',\n",
       " 'prior_stim_results',\n",
       " 'electrode_coordinates',\n",
       " 'jacksheet',\n",
       " 'good_leads',\n",
       " 'leads',\n",
       " 'area',\n",
       " 'classifier_excluded_leads',\n",
       " 'electrode_categories',\n",
       " 'target_selection_file',\n",
       " 'baseline_classifier')"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "finder.montage_files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('session_summary',\n",
       " 'classifier_summary',\n",
       " 'math_summary',\n",
       " 'used_classifier',\n",
       " 'excluded_pairs',\n",
       " 'all_pairs',\n",
       " 'experiment_log',\n",
       " 'session_log',\n",
       " 'event_log',\n",
       " 'experiment_config',\n",
       " 'raw_eeg',\n",
       " 'odin_config',\n",
       " 'all_events',\n",
       " 'task_events',\n",
       " 'math_events',\n",
       " 'ps4_events')"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "finder.session_files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Finding File Paths"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/mnt/rhino/protocols/r1/subjects/R1389J/localizations/0/montages/0/neuroradiology/current_processed/pairs.json\n",
      "/mnt/rhino/protocols/r1/subjects/R1389J/experiments/catFR5/sessions/1/behavioral/current_processed/task_events.json\n",
      "/mnt/rhino/data10/RAM/subjects/R1389J/tal/VOX_coords_mother.txt\n"
     ]
    }
   ],
   "source": [
    "# Find some example files\n",
    "example_data_types = ['pairs', 'task_events', 'voxel_coordinates']\n",
    "for data_type in example_data_types:\n",
    "    print(finder.find(data_type=data_type))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Identifying Available Sessions\n",
    "\n",
    "CMLReaders contains a utility function for loading the json-formatted index files located in the protocols/ directory on RHINO as a dataframe. Once loaded, the standard pandas selection idioms can be used to answer questions such as:\n",
    "\n",
    "1. What subjects completed FR1?\n",
    "2. What experiments did subject R1111M complete?\n",
    "3. How many sessions have been colleted of PAL1?\n",
    "\n",
    "For many analyses, this will be the first step in determining the sample of subjects to be used."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "from cmlreaders import get_data_index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Recognition</th>\n",
       "      <th>all_events</th>\n",
       "      <th>contacts</th>\n",
       "      <th>experiment</th>\n",
       "      <th>import_type</th>\n",
       "      <th>localization</th>\n",
       "      <th>math_events</th>\n",
       "      <th>montage</th>\n",
       "      <th>original_experiment</th>\n",
       "      <th>original_session</th>\n",
       "      <th>pairs</th>\n",
       "      <th>ps4_events</th>\n",
       "      <th>session</th>\n",
       "      <th>subject</th>\n",
       "      <th>subject_alias</th>\n",
       "      <th>system_version</th>\n",
       "      <th>task_events</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/FR1/s...</td>\n",
       "      <td>protocols/r1/subjects/R1001P/localizations/0/m...</td>\n",
       "      <td>FR1</td>\n",
       "      <td>build</td>\n",
       "      <td>0</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/FR1/s...</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>protocols/r1/subjects/R1001P/localizations/0/m...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>R1001P</td>\n",
       "      <td>R1001P</td>\n",
       "      <td>NaN</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/FR1/s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>NaN</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/FR1/s...</td>\n",
       "      <td>protocols/r1/subjects/R1001P/localizations/0/m...</td>\n",
       "      <td>FR1</td>\n",
       "      <td>build</td>\n",
       "      <td>0</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/FR1/s...</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>protocols/r1/subjects/R1001P/localizations/0/m...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>R1001P</td>\n",
       "      <td>R1001P</td>\n",
       "      <td>NaN</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/FR1/s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/FR2/s...</td>\n",
       "      <td>protocols/r1/subjects/R1001P/localizations/0/m...</td>\n",
       "      <td>FR2</td>\n",
       "      <td>build</td>\n",
       "      <td>0</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/FR2/s...</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>protocols/r1/subjects/R1001P/localizations/0/m...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>R1001P</td>\n",
       "      <td>R1001P</td>\n",
       "      <td>NaN</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/FR2/s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/FR2/s...</td>\n",
       "      <td>protocols/r1/subjects/R1001P/localizations/0/m...</td>\n",
       "      <td>FR2</td>\n",
       "      <td>build</td>\n",
       "      <td>0</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/FR2/s...</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>protocols/r1/subjects/R1001P/localizations/0/m...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>R1001P</td>\n",
       "      <td>R1001P</td>\n",
       "      <td>NaN</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/FR2/s...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/PAL1/...</td>\n",
       "      <td>protocols/r1/subjects/R1001P/localizations/0/m...</td>\n",
       "      <td>PAL1</td>\n",
       "      <td>build</td>\n",
       "      <td>0</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/PAL1/...</td>\n",
       "      <td>0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>protocols/r1/subjects/R1001P/localizations/0/m...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>R1001P</td>\n",
       "      <td>R1001P</td>\n",
       "      <td>NaN</td>\n",
       "      <td>protocols/r1/subjects/R1001P/experiments/PAL1/...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Recognition                                         all_events  \\\n",
       "0         NaN  protocols/r1/subjects/R1001P/experiments/FR1/s...   \n",
       "1         NaN  protocols/r1/subjects/R1001P/experiments/FR1/s...   \n",
       "2         NaN  protocols/r1/subjects/R1001P/experiments/FR2/s...   \n",
       "3         NaN  protocols/r1/subjects/R1001P/experiments/FR2/s...   \n",
       "4         NaN  protocols/r1/subjects/R1001P/experiments/PAL1/...   \n",
       "\n",
       "                                            contacts experiment import_type  \\\n",
       "0  protocols/r1/subjects/R1001P/localizations/0/m...        FR1       build   \n",
       "1  protocols/r1/subjects/R1001P/localizations/0/m...        FR1       build   \n",
       "2  protocols/r1/subjects/R1001P/localizations/0/m...        FR2       build   \n",
       "3  protocols/r1/subjects/R1001P/localizations/0/m...        FR2       build   \n",
       "4  protocols/r1/subjects/R1001P/localizations/0/m...       PAL1       build   \n",
       "\n",
       "   localization                                        math_events  montage  \\\n",
       "0             0  protocols/r1/subjects/R1001P/experiments/FR1/s...        0   \n",
       "1             0  protocols/r1/subjects/R1001P/experiments/FR1/s...        0   \n",
       "2             0  protocols/r1/subjects/R1001P/experiments/FR2/s...        0   \n",
       "3             0  protocols/r1/subjects/R1001P/experiments/FR2/s...        0   \n",
       "4             0  protocols/r1/subjects/R1001P/experiments/PAL1/...        0   \n",
       "\n",
       "  original_experiment original_session  \\\n",
       "0                 NaN                0   \n",
       "1                 NaN                1   \n",
       "2                 NaN                0   \n",
       "3                 NaN                1   \n",
       "4                 NaN                0   \n",
       "\n",
       "                                               pairs ps4_events  session  \\\n",
       "0  protocols/r1/subjects/R1001P/localizations/0/m...        NaN        0   \n",
       "1  protocols/r1/subjects/R1001P/localizations/0/m...        NaN        1   \n",
       "2  protocols/r1/subjects/R1001P/localizations/0/m...        NaN        0   \n",
       "3  protocols/r1/subjects/R1001P/localizations/0/m...        NaN        1   \n",
       "4  protocols/r1/subjects/R1001P/localizations/0/m...        NaN        0   \n",
       "\n",
       "  subject subject_alias  system_version  \\\n",
       "0  R1001P        R1001P             NaN   \n",
       "1  R1001P        R1001P             NaN   \n",
       "2  R1001P        R1001P             NaN   \n",
       "3  R1001P        R1001P             NaN   \n",
       "4  R1001P        R1001P             NaN   \n",
       "\n",
       "                                         task_events  \n",
       "0  protocols/r1/subjects/R1001P/experiments/FR1/s...  \n",
       "1  protocols/r1/subjects/R1001P/experiments/FR1/s...  \n",
       "2  protocols/r1/subjects/R1001P/experiments/FR2/s...  \n",
       "3  protocols/r1/subjects/R1001P/experiments/FR2/s...  \n",
       "4  protocols/r1/subjects/R1001P/experiments/PAL1/...  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "r1_data = get_data_index(kind='r1', rootdir=rhino_root)\n",
    "r1_data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['R1001P', 'R1002P', 'R1003P', 'R1006P', 'R1010J', 'R1015J',\n",
       "       'R1018P', 'R1020J', 'R1022J', 'R1023J', 'R1026D', 'R1027J',\n",
       "       'R1030J', 'R1031M', 'R1032D', 'R1033D', 'R1034D', 'R1035M',\n",
       "       'R1036M', 'R1039M', 'R1042M', 'R1044J', 'R1045E', 'R1048E',\n",
       "       'R1049J', 'R1050M', 'R1051J', 'R1052E', 'R1053M', 'R1054J',\n",
       "       'R1056M', 'R1057E', 'R1059J', 'R1060M', 'R1061T', 'R1062J',\n",
       "       'R1063C', 'R1065J', 'R1066P', 'R1067P', 'R1068J', 'R1069M',\n",
       "       'R1070T', 'R1074M', 'R1075J', 'R1076D', 'R1077T', 'R1080E',\n",
       "       'R1081J', 'R1083J', 'R1084T', 'R1086M', 'R1089P', 'R1092J',\n",
       "       'R1093J', 'R1094T', 'R1096E', 'R1098D', 'R1100D', 'R1101T',\n",
       "       'R1102P', 'R1104D', 'R1105E', 'R1106M', 'R1108J', 'R1111M',\n",
       "       'R1112M', 'R1113T', 'R1114C', 'R1115T', 'R1118N', 'R1120E',\n",
       "       'R1121M', 'R1122E', 'R1123C', 'R1124J', 'R1125T', 'R1127P',\n",
       "       'R1128E', 'R1129D', 'R1130M', 'R1131M', 'R1134T', 'R1135E',\n",
       "       'R1136N', 'R1137E', 'R1138T', 'R1142N', 'R1145J', 'R1146E',\n",
       "       'R1147P', 'R1148P', 'R1149N', 'R1150J', 'R1151E', 'R1153T',\n",
       "       'R1154D', 'R1155D', 'R1156D', 'R1158T', 'R1159P', 'R1161E',\n",
       "       'R1162N', 'R1163T', 'R1164E', 'R1166D', 'R1167M', 'R1168T',\n",
       "       'R1169P', 'R1170J', 'R1171M', 'R1172E', 'R1173J', 'R1174T',\n",
       "       'R1175N', 'R1176M', 'R1177M', 'R1178P', 'R1184M', 'R1185N',\n",
       "       'R1186P', 'R1187P', 'R1189M', 'R1191J', 'R1193T', 'R1195E',\n",
       "       'R1196N', 'R1198M', 'R1200T', 'R1201P', 'R1202M', 'R1203T',\n",
       "       'R1204T', 'R1207J', 'R1212P', 'R1214M', 'R1215M', 'R1216E',\n",
       "       'R1217T', 'R1221P', 'R1222M', 'R1223E', 'R1226D', 'R1228M',\n",
       "       'R1229M', 'R1230J', 'R1231M', 'R1232N', 'R1234D', 'R1236J',\n",
       "       'R1240T', 'R1241J', 'R1243T', 'R1247P', 'R1250N', 'R1251M',\n",
       "       'R1260D', 'R1264P', 'R1268T', 'R1274T', 'R1275D', 'R1277J',\n",
       "       'R1281E', 'R1283T', 'R1286J', 'R1288P', 'R1290M', 'R1291M',\n",
       "       'R1292E', 'R1293P', 'R1297T', 'R1298E', 'R1299T', 'R1302M',\n",
       "       'R1304N', 'R1306E', 'R1307N', 'R1308T', 'R1309M', 'R1310J',\n",
       "       'R1311T', 'R1313J', 'R1315T', 'R1316T', 'R1317D', 'R1318N',\n",
       "       'R1320D', 'R1321M', 'R1323T', 'R1324M', 'R1325C', 'R1328E',\n",
       "       'R1329T', 'R1330D', 'R1331T', 'R1332M', 'R1334T', 'R1336T',\n",
       "       'R1337E', 'R1338T', 'R1339D', 'R1341T', 'R1342M', 'R1345D',\n",
       "       'R1346T', 'R1347D', 'R1349T', 'R1350D', 'R1351M', 'R1354E',\n",
       "       'R1355T', 'R1358T', 'R1361C', 'R1363T', 'R1364C', 'R1367D',\n",
       "       'R1368T', 'R1373T', 'R1374T', 'R1375C', 'R1376D', 'R1377M',\n",
       "       'R1378T', 'R1379E', 'R1380D', 'R1381T', 'R1383J', 'R1384J',\n",
       "       'R1385E', 'R1386T', 'R1387E', 'R1390M', 'R1391T', 'R1393T',\n",
       "       'R1394E', 'R1395M', 'R1396T', 'R1397D', 'R1398J', 'R1401J',\n",
       "       'R1402E', 'R1404E', 'R1405E', 'R1406M', 'R1409D', 'R1412M',\n",
       "       'R1414E', 'R1415T', 'R1416T', 'R1420T', 'R1421M', 'R1422T',\n",
       "       'R1423E', 'R1425D', 'R1427T', 'R1431J', 'R1438M'], dtype=object)"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# What subjects completed FR1?\n",
    "fr1_subjects = r1_data[r1_data['experiment'] == 'FR1']['subject'].unique()\n",
    "fr1_subjects"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['FR1', 'FR2', 'PAL1', 'PAL2', 'PS2', 'catFR1'], dtype=object)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# What experiments did R1111M complete?\n",
    "r1111m_experiments = r1_data[r1_data['subject'] == 'R1111M']['experiment'].unique()\n",
    "r1111m_experiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "151"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# How many sessions of PAL1 have been collected?\n",
    "pal_sessions = r1_data[r1_data['experiment'] == 'PAL1']\n",
    "len(pal_sessions)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading Data\n",
    "\n",
    "In most cases, the end goal is to load the data into memory rather than just locating a file or determing what data has been collected. In this case, CML Readers provides a handy class to unify the API for loading data. By default, the location will be determined automatically based on the file type using the PathFinder class demonstrated earlier. However, a custom path can be given by using the file_path keyword. This can be useful if you have some data stored locally that is in the same format as one of the data types supported by CMLReaders that you would like to load and use. See the \"Loading from a Custom Location\" section below for an example.\n",
    "\n",
    "Each data type has a default representation that is returned when you call the .load() method. Most users will want to use this default representation. However, if you would like to get the data in a different format, you have two options:\n",
    "\n",
    "1. Get the reader for the data type and load the data using a different supported method using one of the as_x methods\n",
    "2. Load the data as the default type and convert it manually"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "reader = cml.CMLReader(subject=\"R1389J\", experiment=\"catFR5\", session=1, \n",
    "                       localization=0, montage=0, rootdir=rhino_root)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using the Default Representation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>eegoffset</th>\n",
       "      <th>category</th>\n",
       "      <th>category_num</th>\n",
       "      <th>eegfile</th>\n",
       "      <th>exp_version</th>\n",
       "      <th>experiment</th>\n",
       "      <th>intrusion</th>\n",
       "      <th>is_stim</th>\n",
       "      <th>item_name</th>\n",
       "      <th>item_num</th>\n",
       "      <th>...</th>\n",
       "      <th>recog_rt</th>\n",
       "      <th>recognized</th>\n",
       "      <th>rectime</th>\n",
       "      <th>rejected</th>\n",
       "      <th>serialpos</th>\n",
       "      <th>session</th>\n",
       "      <th>stim_list</th>\n",
       "      <th>stim_params</th>\n",
       "      <th>subject</th>\n",
       "      <th>type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-1</td>\n",
       "      <td>X</td>\n",
       "      <td>-999</td>\n",
       "      <td></td>\n",
       "      <td></td>\n",
       "      <td>catFR5</td>\n",
       "      <td>-999</td>\n",
       "      <td>False</td>\n",
       "      <td>X</td>\n",
       "      <td>-999</td>\n",
       "      <td>...</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "      <td>[]</td>\n",
       "      <td>R1389J</td>\n",
       "      <td>STIM_ARTIFACT_DETECTION_START</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5831</td>\n",
       "      <td>X</td>\n",
       "      <td>-999</td>\n",
       "      <td>R1389J_catFR5_1_28Feb18_1552.h5</td>\n",
       "      <td></td>\n",
       "      <td>catFR5</td>\n",
       "      <td>-999</td>\n",
       "      <td>False</td>\n",
       "      <td></td>\n",
       "      <td>-1</td>\n",
       "      <td>...</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "      <td>[{'amplitude': 500.0, 'anode_label': 'STG6', '...</td>\n",
       "      <td>R1389J</td>\n",
       "      <td>STIM_ON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7790</td>\n",
       "      <td>X</td>\n",
       "      <td>-999</td>\n",
       "      <td>R1389J_catFR5_1_28Feb18_1552.h5</td>\n",
       "      <td></td>\n",
       "      <td>catFR5</td>\n",
       "      <td>-999</td>\n",
       "      <td>False</td>\n",
       "      <td></td>\n",
       "      <td>-1</td>\n",
       "      <td>...</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "      <td>[{'amplitude': 500.0, 'anode_label': 'STG6', '...</td>\n",
       "      <td>R1389J</td>\n",
       "      <td>STIM_ON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>9786</td>\n",
       "      <td>X</td>\n",
       "      <td>-999</td>\n",
       "      <td>R1389J_catFR5_1_28Feb18_1552.h5</td>\n",
       "      <td></td>\n",
       "      <td>catFR5</td>\n",
       "      <td>-999</td>\n",
       "      <td>False</td>\n",
       "      <td></td>\n",
       "      <td>-1</td>\n",
       "      <td>...</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "      <td>[{'amplitude': 500.0, 'anode_label': 'STG6', '...</td>\n",
       "      <td>R1389J</td>\n",
       "      <td>STIM_ON</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>11782</td>\n",
       "      <td>X</td>\n",
       "      <td>-999</td>\n",
       "      <td>R1389J_catFR5_1_28Feb18_1552.h5</td>\n",
       "      <td></td>\n",
       "      <td>catFR5</td>\n",
       "      <td>-999</td>\n",
       "      <td>False</td>\n",
       "      <td></td>\n",
       "      <td>-1</td>\n",
       "      <td>...</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>-999</td>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "      <td>[{'amplitude': 500.0, 'anode_label': 'STG6', '...</td>\n",
       "      <td>R1389J</td>\n",
       "      <td>STIM_ON</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 28 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   eegoffset category  category_num                          eegfile  \\\n",
       "0         -1        X          -999                                    \n",
       "1       5831        X          -999  R1389J_catFR5_1_28Feb18_1552.h5   \n",
       "2       7790        X          -999  R1389J_catFR5_1_28Feb18_1552.h5   \n",
       "3       9786        X          -999  R1389J_catFR5_1_28Feb18_1552.h5   \n",
       "4      11782        X          -999  R1389J_catFR5_1_28Feb18_1552.h5   \n",
       "\n",
       "  exp_version experiment  intrusion  is_stim item_name  item_num  \\\n",
       "0                 catFR5       -999    False         X      -999   \n",
       "1                 catFR5       -999    False                  -1   \n",
       "2                 catFR5       -999    False                  -1   \n",
       "3                 catFR5       -999    False                  -1   \n",
       "4                 catFR5       -999    False                  -1   \n",
       "\n",
       "               ...                recog_rt  recognized  rectime  rejected  \\\n",
       "0              ...                    -999        -999     -999      -999   \n",
       "1              ...                    -999        -999     -999      -999   \n",
       "2              ...                    -999        -999     -999      -999   \n",
       "3              ...                    -999        -999     -999      -999   \n",
       "4              ...                    -999        -999     -999      -999   \n",
       "\n",
       "  serialpos session  stim_list  \\\n",
       "0      -999       1      False   \n",
       "1      -999       1      False   \n",
       "2      -999       1      False   \n",
       "3      -999       1      False   \n",
       "4      -999       1      False   \n",
       "\n",
       "                                         stim_params  subject  \\\n",
       "0                                                 []   R1389J   \n",
       "1  [{'amplitude': 500.0, 'anode_label': 'STG6', '...   R1389J   \n",
       "2  [{'amplitude': 500.0, 'anode_label': 'STG6', '...   R1389J   \n",
       "3  [{'amplitude': 500.0, 'anode_label': 'STG6', '...   R1389J   \n",
       "4  [{'amplitude': 500.0, 'anode_label': 'STG6', '...   R1389J   \n",
       "\n",
       "                            type  \n",
       "0  STIM_ARTIFACT_DETECTION_START  \n",
       "1                        STIM_ON  \n",
       "2                        STIM_ON  \n",
       "3                        STIM_ON  \n",
       "4                        STIM_ON  \n",
       "\n",
       "[5 rows x 28 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Pandas dataframe\n",
    "events_df = reader.load('task_events')\n",
    "events_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'interictal': ['ONEMC 9',\n",
       "  'ONEMC10',\n",
       "  'ONEMC8',\n",
       "  'SMC3',\n",
       "  'SMC4',\n",
       "  'STG 7 ',\n",
       "  'STG8',\n",
       "  'TWOSTG 4'],\n",
       " 'brain_lesion': ['FOURSC', 'ONESC', 'ONNEMC', 'THREESC', 'TWOMC', 'TWOSC'],\n",
       " 'bad_channel': ['NONE'],\n",
       " 'soz': []}"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Python dictionary\n",
    "electrode_categories_dict = reader.load('electrode_categories')\n",
    "electrode_categories_dict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using the Underlying Reader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "cmlreaders.readers.readers.EventReader"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Ask CMLReader to give back the reader instead of the data\n",
    "event_reader = reader.get_reader('task_events')\n",
    "type(event_reader)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[{'eegoffset': -1,\n",
       "  'category': 'X',\n",
       "  'category_num': -999,\n",
       "  'eegfile': '',\n",
       "  'exp_version': '',\n",
       "  'experiment': 'catFR5',\n",
       "  'intrusion': -999,\n",
       "  'is_stim': False,\n",
       "  'item_name': 'X',\n",
       "  'item_num': -999,\n",
       "  'list': -999,\n",
       "  'montage': 0,\n",
       "  'msoffset': -1,\n",
       "  'mstime': -1,\n",
       "  'phase': '',\n",
       "  'protocol': 'r1',\n",
       "  'recalled': False,\n",
       "  'recog_resp': -999,\n",
       "  'recog_rt': -999,\n",
       "  'recognized': -999,\n",
       "  'rectime': -999,\n",
       "  'rejected': -999,\n",
       "  'serialpos': -999,\n",
       "  'session': 1,\n",
       "  'stim_list': False,\n",
       "  'stim_params': [],\n",
       "  'subject': 'R1389J',\n",
       "  'type': 'STIM_ARTIFACT_DETECTION_START'}]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Load the task events as a dictionary instead of the default representation\n",
    "event_dict = event_reader.as_dict()\n",
    "event_dict[:1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0, -1, 'X', -999, '', '', 'catFR5', -999, False, 'X', -999, -999, 0, -1, -1, '', 'r1', False, -999, -999, -999, -999, -999, -999, 1, False, [], 'R1389J', 'STIM_ARTIFACT_DETECTION_START')"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Load the task event as a recarray (not recommended)\n",
    "event_recarray = event_reader.as_recarray()\n",
    "event_recarray[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Examples"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading Multiple Sessions\n",
    "\n",
    "CMLReaders is currently designed for loading events one session at a time. This may change in the future, but in the interim, it is straightforward to load multiple sessions-worth of events."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([0, 2, 3, 9])"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Find all sessions of FR6 that subject R1409D completed\n",
    "sessions_completed = r1_data[(r1_data['subject'] == 'R1409D') & \n",
    "                             (r1_data['experiment'] == 'FR6')]['session'].unique()\n",
    "sessions_completed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4527"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Verbose method\n",
    "all_events = []\n",
    "for session in sessions_completed:\n",
    "    sess_events = cml.CMLReader(subject=\"R1409D\", experiment=\"FR6\", session=session, \n",
    "                                localization=0, montage=0, rootdir=rhino_root).load('task_events')\n",
    "    all_events.append(sess_events)\n",
    "\n",
    "all_sessions_df = pd.concat(all_events)\n",
    "len(all_sessions_df)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4527"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Same operation, but with list comprehension\n",
    "all_session_df = pd.concat([cml.CMLReader(subject='R1409D', experiment='FR6', session=session, rootdir=rhino_root).load('events')\n",
    "                            for session in sessions_completed])\n",
    "len(all_sessions_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also use the special `load_events` classmethod to load events from multiple subjects and/or experiments."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['R1111M', 'R1409D'], dtype=object)"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "aggregate_events = cml.CMLReader.load_events(subjects=[\"R1111M\", \"R1409D\"],\n",
    "                                             experiments=[\"FR1\"],\n",
    "                                             rootdir=rhino_root)\n",
    "aggregate_events.subject.unique()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading EEG"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Loading EEG data is a bit more complicated than loading other data types. For this reason, rather than using the general `load` method, we instead use `load_eeg` which takes special keyword arguments. As always, start by instantiating a `CMLReader`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [],
   "source": [
    "reader = cml.CMLReader(subject='R1111M', experiment='FR1', session=0, rootdir=rhino_root)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reading a full session"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we give no parameters to `reader.load_eeg`, then by default all data for an entire session will be loaded as the data were recorded:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/depalati/src/cmlreaders/cmlreaders/path_finder.py:225: MultiplePathsFoundWarning: Multiple files found. Returning the first file found\n",
      "  'file found', MultiplePathsFoundWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(1, 100, 1623160)"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "full_session_eeg = reader.load_eeg()\n",
    "full_session_eeg.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reading from events"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To select EEG epochs based on events, first load the events and use standard pandas idioms to select events of interest:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "events = reader.load('events')\n",
    "word_events = events[events['type'] == 'WORD']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to passing the events, we also need to specify relative start and stop times in milliseconds. Below, we will load data for each word onset starting at the time the word appeared and ending 100 ms later:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/depalati/src/cmlreaders/cmlreaders/path_finder.py:225: MultiplePathsFoundWarning: Multiple files found. Returning the first file found\n",
      "  'file found', MultiplePathsFoundWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(288, 100, 50)"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "word_event_eeg = reader.load_eeg(events=word_events, rel_start=0, rel_stop=100)\n",
    "word_event_eeg.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reading from multiple sessions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Building upon aggregate event loading, we can also load EEG data from multiple sessions. The caveat here is that we are restricted to a single subject:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/Users/depalati/src/cmlreaders/cmlreaders/path_finder.py:225: MultiplePathsFoundWarning: Multiple files found. Returning the first file found\n",
      "  'file found', MultiplePathsFoundWarning)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "(1020, 100, 100)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "all_fr1_events = reader.load_events(subjects=[\"R1111M\"], experiments=[\"FR1\"], rootdir=rhino_root)\n",
    "fr1_words = all_fr1_events[all_fr1_events[\"type\"] == \"WORD\"]\n",
    "fr1_eeg = reader.load_eeg(events=fr1_words, rel_start=-100, rel_stop=100)\n",
    "fr1_eeg.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Converting EEG Representation\n",
    "\n",
    "By default, CMLReaders uses a simple container for time series data. However, we provide two utility methods for converting this representation to a format that is understood by PTSA and MNE, since these are common libraries for interacting with EEG data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ptsa.data.timeseries.TimeSeries"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ptsa_eeg = word_event_eeg.to_ptsa()\n",
    "type(ptsa_eeg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "mne.epochs.EpochsArray"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mne_eeg = word_event_eeg.to_mne()\n",
    "type(mne_eeg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:cmlreaders]",
   "language": "python",
   "name": "conda-env-cmlreaders-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}