The SUSY dataset from the UCI Machine Learning repository:
download_dataset("susy.csv.gz")
A compressed version in feather format is also available for faster loading in-class:
download_dataset("susy.feather")
To get the training (first 4,500,000 rows) and test (last 500,000 rows) sets, run:
download_dataset("susy_train.feather")
download_dataset("susy_test.feather")
To get a random sample of 100,000 rows from susy_train
, run:
download_dataset("susy_sample.feather")
For the 3D shape classification task in lesson 6, you can download the ZIP file of real-world objects as follows:
download_dataset("shapes.zip")
We also provide precomputed persistence diagrams so you can save time when running on Binder / Colab:
# circles, spheres, tori
download_dataset("diagrams_basic.pkl")
# real-world objects
download_dataset("diagrams.pkl")
For the computer vision experiments, you can download the images as follows:
download_dataset("Cells.jpg")
download_dataset("BlackHole.jpg")
Gravitational waves
The following function generates noisy time series embedded with gravitational waves. We thank C. Bersten for providing the code and data from his article with J.H. Jung: Detection of gravitational waves using topological data analysis and convolutional neural network: An improved approach
download_dataset('gravitational-wave-signals.npy')
DATA = Path('../data')
noisy_signals, gw_signals, labels = make_gravitational_waves(path_to_data=DATA)
# get the index corresponding to the first pure noise time series
background_idx = np.argmin(labels)
# get the index corresponding to the first noise + gravitational wave time series
signal_idx = np.argmax(labels)
fig, (ax0, ax1) = plt.subplots(ncols=2, figsize=(12, 4), sharey=True)
ax0.plot(noisy_signals[background_idx])
ax0.set_ylabel("Amplitude")
ax0.set_xlabel("Time step")
ax0.set_title("Pure noise")
ax1.plot(noisy_signals[1])
ax1.plot(gw_signals[signal_idx])
ax1.set_xlabel("Time step")
ax1.set_title("Noise with gravitational wave signal")
plt.tight_layout()