Getting started with timeseries

This notebook will walk you through some of the basic methods in PTSA’s TimeSeries class. We’ll see how to construct an instance of TimeSeries, how to save and load data, as well as resampling and filtering TimeSeries.

[1]:
# relevant imports
import numpy as np
from matplotlib import pyplot as plt
%matplotlib notebook
from ptsa.data.timeseries import TimeSeries

Let’s start with creating some data

In real applications, you will most likely have your own timeseries data for analysis. For the purpose of illustrating the functionalities of the timeseries class, we will construct sinusoids as our timeseries data. Let’s create an array of 5000 data points, or samples. Suppose the sampling rate is 10Hz, this means that our timeseires is 5000/10=500 seconds long.

[2]:
num_points = 5000
sample_rate = 10.

# We can specify the timestamps for each data point, from 0s to 500s.
t = np.linspace(1, num_points, num_points) / sample_rate

# Let's create two noisy sinusoids with different frequencies.
frequency1 = .5 # 1 cycle every 2 seconds
frequency2 = .1 # 1 cycle every 10 seconds
data1 = np.sin(2*np.pi*frequency1*t) + np.random.uniform(-0.5, 0.5, num_points)
data2 = np.sin(2*np.pi*frequency2*t) + np.random.uniform(-0.5, 0.5, num_points)

Let’s check our timepoints.

[3]:
print ('First 5 timestamps: ', t[:5])
print ('Last 5 timestamps: ', t[-5:])
First 5 timestamps:  [ 0.1  0.2  0.3  0.4  0.5]
Last 5 timestamps:  [ 499.6  499.7  499.8  499.9  500. ]

We can also visualize the timeseries using matplotlib.

[4]:
plt.figure(figsize=[10,2])
plt.plot(t, data1, label='%sHz'%str(frequency1))
plt.plot(t, data2, label='%sHz'%str(frequency2))

plt.legend()
[4]:
<matplotlib.legend.Legend at 0x2b7884cc0518>

As we zoom in to the data array, the random noise we added to the sinusoids becomes clear.

[5]:
plt.figure(figsize=[10, 2])
plt.plot(t[500:1000], data1[500:1000], label='%sHz'%str(frequency1))
plt.plot(t[500:1000], data2[500:1000], label='%sHz'%str(frequency2))
plt.legend()
[5]:
<matplotlib.legend.Legend at 0x2b78be4cc908>

Create a TimeSeries object

The TimeSeries class is a convenient wrapper of xarray that offers basic functionalities for timeseries analysis. Although we focus our analysis here in the context of timeseries data, many of the following examples can be easily extended to non-timeseries, multidimensional data. To create a TimeSeries object, we simply need to construct dimensions and the corresponding coordinates in each dimension.

[6]:
# Let's stack the two time-series data arrays.
data = np.vstack((data1, data2))

# and construct the TimeSeries object
ts = TimeSeries(data,
                dims=('frequency', 'time'),
                coords={'frequency':[frequency1, frequency2],
                        'time':t,
                        'samplerate':sample_rate})
print (ts)
<xarray.TimeSeries (frequency: 2, time: 5000)>
array([[-0.02586 ,  0.497876,  0.336389, ..., -0.680535, -0.378594,  0.029344],
       [-0.016682, -0.123327, -0.285444, ..., -0.406314,  0.222235, -0.272452]])
Coordinates:
  * frequency   (frequency) float64 0.5 0.1
  * time        (time) float64 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 ...
    samplerate  float64 10.0

TimeSeries also has a convenient plotting method, inherited from xarray.

Note: since the frequency dimension has float coordinates, we want to be careful for exact float comparisons. Thus, instead of using ts.sel(frequency=frequency1), we use ts.sel(frequency=ts.frequency[0]). See more about the .sel() method in later sections.

[7]:
plt.figure(figsize=[10,2])
ts.sel(frequency=ts.frequency[0]).plot()