Using a Deep Neural Network for Quark Tagging
If you’re running this in Colab, make sure to save a copy of the notebook in Google Drive to save your changes.
In this notebook, we’re going to explore the use of Deep Neural Networks (DNN) in particle physics, particularly in the task of quark tagging. The goal of this notebook is to create a simple model that can classify whether a jet represents a top-quark signal or a quark-gluon background.
The dataset used in this notebook is the open-source Top Quark Tagging dataset produced from simulations of proton-proton collisions.
[1]:
# If you're running this notebook, uncomment the code in this cell to install the required packages.
# ! pip install fastai
# ! pip install scikit-learn
# ! pip install scipy
# ! pip install datasets
[1]:
from datasets import load_dataset
from fastai.tabular.all import *
from scipy.interpolate import interp1d
from sklearn.metrics import accuracy_score, auc, roc_curve
Downloading the dataset
First, let’s download and inspect the dataset.
[ ]:
# If you're running this in Colab or locally, run the next line to download the dataset.
top_tagging_ds = load_dataset("dl4phys/top_tagging")
This dataset contains simulated events of proton-proton collisions clustered into jets. Each jet is numbered with an integer subscript and there are 4 data points for each jet labelled E, PX, PY, and PZ, corresponding to the four momentum defined as \(p = (E, p_x, p_y, p_z)\), where \(E\) is the energy of the jet and \(p_x, p_y, p_z\) are the components of the jet’s momentum in three (orthogonal) spatial directions.
The dataset also contains the truth value of top-quark 4-momentum, labelled as truthE, truthPX, truthPY, truthPZ. Lastly, it also contains the ttv flag which is meant to differentiate between train, test, and validation sets (discussed in the primer on machine learning), and a is_signal_new flag, indicating whether that particular jet represents a top-quark signal or a quark-gluon background.
[3]:
top_tagging_ds
[3]:
DatasetDict({
train: Dataset({
features: ['E_0', 'PX_0', 'PY_0', 'PZ_0', 'E_1', 'PX_1', 'PY_1', 'PZ_1', 'E_2', 'PX_2', 'PY_2', 'PZ_2', 'E_3', 'PX_3', 'PY_3', 'PZ_3', 'E_4', 'PX_4', 'PY_4', 'PZ_4', 'E_5', 'PX_5', 'PY_5', 'PZ_5', 'E_6', 'PX_6', 'PY_6', 'PZ_6', 'E_7', 'PX_7', 'PY_7', 'PZ_7', 'E_8', 'PX_8', 'PY_8', 'PZ_8', 'E_9', 'PX_9', 'PY_9', 'PZ_9', 'E_10', 'PX_10', 'PY_10', 'PZ_10', 'E_11', 'PX_11', 'PY_11', 'PZ_11', 'E_12', 'PX_12', 'PY_12', 'PZ_12', 'E_13', 'PX_13', 'PY_13', 'PZ_13', 'E_14', 'PX_14', 'PY_14', 'PZ_14', 'E_15', 'PX_15', 'PY_15', 'PZ_15', 'E_16', 'PX_16', 'PY_16', 'PZ_16', 'E_17', 'PX_17', 'PY_17', 'PZ_17', 'E_18', 'PX_18', 'PY_18', 'PZ_18', 'E_19', 'PX_19', 'PY_19', 'PZ_19', 'E_20', 'PX_20', 'PY_20', 'PZ_20', 'E_21', 'PX_21', 'PY_21', 'PZ_21', 'E_22', 'PX_22', 'PY_22', 'PZ_22', 'E_23', 'PX_23', 'PY_23', 'PZ_23', 'E_24', 'PX_24', 'PY_24', 'PZ_24', 'E_25', 'PX_25', 'PY_25', 'PZ_25', 'E_26', 'PX_26', 'PY_26', 'PZ_26', 'E_27', 'PX_27', 'PY_27', 'PZ_27', 'E_28', 'PX_28', 'PY_28', 'PZ_28', 'E_29', 'PX_29', 'PY_29', 'PZ_29', 'E_30', 'PX_30', 'PY_30', 'PZ_30', 'E_31', 'PX_31', 'PY_31', 'PZ_31', 'E_32', 'PX_32', 'PY_32', 'PZ_32', 'E_33', 'PX_33', 'PY_33', 'PZ_33', 'E_34', 'PX_34', 'PY_34', 'PZ_34', 'E_35', 'PX_35', 'PY_35', 'PZ_35', 'E_36', 'PX_36', 'PY_36', 'PZ_36', 'E_37', 'PX_37', 'PY_37', 'PZ_37', 'E_38', 'PX_38', 'PY_38', 'PZ_38', 'E_39', 'PX_39', 'PY_39', 'PZ_39', 'E_40', 'PX_40', 'PY_40', 'PZ_40', 'E_41', 'PX_41', 'PY_41', 'PZ_41', 'E_42', 'PX_42', 'PY_42', 'PZ_42', 'E_43', 'PX_43', 'PY_43', 'PZ_43', 'E_44', 'PX_44', 'PY_44', 'PZ_44', 'E_45', 'PX_45', 'PY_45', 'PZ_45', 'E_46', 'PX_46', 'PY_46', 'PZ_46', 'E_47', 'PX_47', 'PY_47', 'PZ_47', 'E_48', 'PX_48', 'PY_48', 'PZ_48', 'E_49', 'PX_49', 'PY_49', 'PZ_49', 'E_50', 'PX_50', 'PY_50', 'PZ_50', 'E_51', 'PX_51', 'PY_51', 'PZ_51', 'E_52', 'PX_52', 'PY_52', 'PZ_52', 'E_53', 'PX_53', 'PY_53', 'PZ_53', 'E_54', 'PX_54', 'PY_54', 'PZ_54', 'E_55', 'PX_55', 'PY_55', 'PZ_55', 'E_56', 'PX_56', 'PY_56', 'PZ_56', 'E_57', 'PX_57', 'PY_57', 'PZ_57', 'E_58', 'PX_58', 'PY_58', 'PZ_58', 'E_59', 'PX_59', 'PY_59', 'PZ_59', 'E_60', 'PX_60', 'PY_60', 'PZ_60', 'E_61', 'PX_61', 'PY_61', 'PZ_61', 'E_62', 'PX_62', 'PY_62', 'PZ_62', 'E_63', 'PX_63', 'PY_63', 'PZ_63', 'E_64', 'PX_64', 'PY_64', 'PZ_64', 'E_65', 'PX_65', 'PY_65', 'PZ_65', 'E_66', 'PX_66', 'PY_66', 'PZ_66', 'E_67', 'PX_67', 'PY_67', 'PZ_67', 'E_68', 'PX_68', 'PY_68', 'PZ_68', 'E_69', 'PX_69', 'PY_69', 'PZ_69', 'E_70', 'PX_70', 'PY_70', 'PZ_70', 'E_71', 'PX_71', 'PY_71', 'PZ_71', 'E_72', 'PX_72', 'PY_72', 'PZ_72', 'E_73', 'PX_73', 'PY_73', 'PZ_73', 'E_74', 'PX_74', 'PY_74', 'PZ_74', 'E_75', 'PX_75', 'PY_75', 'PZ_75', 'E_76', 'PX_76', 'PY_76', 'PZ_76', 'E_77', 'PX_77', 'PY_77', 'PZ_77', 'E_78', 'PX_78', 'PY_78', 'PZ_78', 'E_79', 'PX_79', 'PY_79', 'PZ_79', 'E_80', 'PX_80', 'PY_80', 'PZ_80', 'E_81', 'PX_81', 'PY_81', 'PZ_81', 'E_82', 'PX_82', 'PY_82', 'PZ_82', 'E_83', 'PX_83', 'PY_83', 'PZ_83', 'E_84', 'PX_84', 'PY_84', 'PZ_84', 'E_85', 'PX_85', 'PY_85', 'PZ_85', 'E_86', 'PX_86', 'PY_86', 'PZ_86', 'E_87', 'PX_87', 'PY_87', 'PZ_87', 'E_88', 'PX_88', 'PY_88', 'PZ_88', 'E_89', 'PX_89', 'PY_89', 'PZ_89', 'E_90', 'PX_90', 'PY_90', 'PZ_90', 'E_91', 'PX_91', 'PY_91', 'PZ_91', 'E_92', 'PX_92', 'PY_92', 'PZ_92', 'E_93', 'PX_93', 'PY_93', 'PZ_93', 'E_94', 'PX_94', 'PY_94', 'PZ_94', 'E_95', 'PX_95', 'PY_95', 'PZ_95', 'E_96', 'PX_96', 'PY_96', 'PZ_96', 'E_97', 'PX_97', 'PY_97', 'PZ_97', 'E_98', 'PX_98', 'PY_98', 'PZ_98', 'E_99', 'PX_99', 'PY_99', 'PZ_99', 'E_100', 'PX_100', 'PY_100', 'PZ_100', 'E_101', 'PX_101', 'PY_101', 'PZ_101', 'E_102', 'PX_102', 'PY_102', 'PZ_102', 'E_103', 'PX_103', 'PY_103', 'PZ_103', 'E_104', 'PX_104', 'PY_104', 'PZ_104', 'E_105', 'PX_105', 'PY_105', 'PZ_105', 'E_106', 'PX_106', 'PY_106', 'PZ_106', 'E_107', 'PX_107', 'PY_107', 'PZ_107', 'E_108', 'PX_108', 'PY_108', 'PZ_108', 'E_109', 'PX_109', 'PY_109', 'PZ_109', 'E_110', 'PX_110', 'PY_110', 'PZ_110', 'E_111', 'PX_111', 'PY_111', 'PZ_111', 'E_112', 'PX_112', 'PY_112', 'PZ_112', 'E_113', 'PX_113', 'PY_113', 'PZ_113', 'E_114', 'PX_114', 'PY_114', 'PZ_114', 'E_115', 'PX_115', 'PY_115', 'PZ_115', 'E_116', 'PX_116', 'PY_116', 'PZ_116', 'E_117', 'PX_117', 'PY_117', 'PZ_117', 'E_118', 'PX_118', 'PY_118', 'PZ_118', 'E_119', 'PX_119', 'PY_119', 'PZ_119', 'E_120', 'PX_120', 'PY_120', 'PZ_120', 'E_121', 'PX_121', 'PY_121', 'PZ_121', 'E_122', 'PX_122', 'PY_122', 'PZ_122', 'E_123', 'PX_123', 'PY_123', 'PZ_123', 'E_124', 'PX_124', 'PY_124', 'PZ_124', 'E_125', 'PX_125', 'PY_125', 'PZ_125', 'E_126', 'PX_126', 'PY_126', 'PZ_126', 'E_127', 'PX_127', 'PY_127', 'PZ_127', 'E_128', 'PX_128', 'PY_128', 'PZ_128', 'E_129', 'PX_129', 'PY_129', 'PZ_129', 'E_130', 'PX_130', 'PY_130', 'PZ_130', 'E_131', 'PX_131', 'PY_131', 'PZ_131', 'E_132', 'PX_132', 'PY_132', 'PZ_132', 'E_133', 'PX_133', 'PY_133', 'PZ_133', 'E_134', 'PX_134', 'PY_134', 'PZ_134', 'E_135', 'PX_135', 'PY_135', 'PZ_135', 'E_136', 'PX_136', 'PY_136', 'PZ_136', 'E_137', 'PX_137', 'PY_137', 'PZ_137', 'E_138', 'PX_138', 'PY_138', 'PZ_138', 'E_139', 'PX_139', 'PY_139', 'PZ_139', 'E_140', 'PX_140', 'PY_140', 'PZ_140', 'E_141', 'PX_141', 'PY_141', 'PZ_141', 'E_142', 'PX_142', 'PY_142', 'PZ_142', 'E_143', 'PX_143', 'PY_143', 'PZ_143', 'E_144', 'PX_144', 'PY_144', 'PZ_144', 'E_145', 'PX_145', 'PY_145', 'PZ_145', 'E_146', 'PX_146', 'PY_146', 'PZ_146', 'E_147', 'PX_147', 'PY_147', 'PZ_147', 'E_148', 'PX_148', 'PY_148', 'PZ_148', 'E_149', 'PX_149', 'PY_149', 'PZ_149', 'E_150', 'PX_150', 'PY_150', 'PZ_150', 'E_151', 'PX_151', 'PY_151', 'PZ_151', 'E_152', 'PX_152', 'PY_152', 'PZ_152', 'E_153', 'PX_153', 'PY_153', 'PZ_153', 'E_154', 'PX_154', 'PY_154', 'PZ_154', 'E_155', 'PX_155', 'PY_155', 'PZ_155', 'E_156', 'PX_156', 'PY_156', 'PZ_156', 'E_157', 'PX_157', 'PY_157', 'PZ_157', 'E_158', 'PX_158', 'PY_158', 'PZ_158', 'E_159', 'PX_159', 'PY_159', 'PZ_159', 'E_160', 'PX_160', 'PY_160', 'PZ_160', 'E_161', 'PX_161', 'PY_161', 'PZ_161', 'E_162', 'PX_162', 'PY_162', 'PZ_162', 'E_163', 'PX_163', 'PY_163', 'PZ_163', 'E_164', 'PX_164', 'PY_164', 'PZ_164', 'E_165', 'PX_165', 'PY_165', 'PZ_165', 'E_166', 'PX_166', 'PY_166', 'PZ_166', 'E_167', 'PX_167', 'PY_167', 'PZ_167', 'E_168', 'PX_168', 'PY_168', 'PZ_168', 'E_169', 'PX_169', 'PY_169', 'PZ_169', 'E_170', 'PX_170', 'PY_170', 'PZ_170', 'E_171', 'PX_171', 'PY_171', 'PZ_171', 'E_172', 'PX_172', 'PY_172', 'PZ_172', 'E_173', 'PX_173', 'PY_173', 'PZ_173', 'E_174', 'PX_174', 'PY_174', 'PZ_174', 'E_175', 'PX_175', 'PY_175', 'PZ_175', 'E_176', 'PX_176', 'PY_176', 'PZ_176', 'E_177', 'PX_177', 'PY_177', 'PZ_177', 'E_178', 'PX_178', 'PY_178', 'PZ_178', 'E_179', 'PX_179', 'PY_179', 'PZ_179', 'E_180', 'PX_180', 'PY_180', 'PZ_180', 'E_181', 'PX_181', 'PY_181', 'PZ_181', 'E_182', 'PX_182', 'PY_182', 'PZ_182', 'E_183', 'PX_183', 'PY_183', 'PZ_183', 'E_184', 'PX_184', 'PY_184', 'PZ_184', 'E_185', 'PX_185', 'PY_185', 'PZ_185', 'E_186', 'PX_186', 'PY_186', 'PZ_186', 'E_187', 'PX_187', 'PY_187', 'PZ_187', 'E_188', 'PX_188', 'PY_188', 'PZ_188', 'E_189', 'PX_189', 'PY_189', 'PZ_189', 'E_190', 'PX_190', 'PY_190', 'PZ_190', 'E_191', 'PX_191', 'PY_191', 'PZ_191', 'E_192', 'PX_192', 'PY_192', 'PZ_192', 'E_193', 'PX_193', 'PY_193', 'PZ_193', 'E_194', 'PX_194', 'PY_194', 'PZ_194', 'E_195', 'PX_195', 'PY_195', 'PZ_195', 'E_196', 'PX_196', 'PY_196', 'PZ_196', 'E_197', 'PX_197', 'PY_197', 'PZ_197', 'E_198', 'PX_198', 'PY_198', 'PZ_198', 'E_199', 'PX_199', 'PY_199', 'PZ_199', 'truthE', 'truthPX', 'truthPY', 'truthPZ', 'ttv', 'is_signal_new'],
num_rows: 1211000
})
test: Dataset({
features: ['E_0', 'PX_0', 'PY_0', 'PZ_0', 'E_1', 'PX_1', 'PY_1', 'PZ_1', 'E_2', 'PX_2', 'PY_2', 'PZ_2', 'E_3', 'PX_3', 'PY_3', 'PZ_3', 'E_4', 'PX_4', 'PY_4', 'PZ_4', 'E_5', 'PX_5', 'PY_5', 'PZ_5', 'E_6', 'PX_6', 'PY_6', 'PZ_6', 'E_7', 'PX_7', 'PY_7', 'PZ_7', 'E_8', 'PX_8', 'PY_8', 'PZ_8', 'E_9', 'PX_9', 'PY_9', 'PZ_9', 'E_10', 'PX_10', 'PY_10', 'PZ_10', 'E_11', 'PX_11', 'PY_11', 'PZ_11', 'E_12', 'PX_12', 'PY_12', 'PZ_12', 'E_13', 'PX_13', 'PY_13', 'PZ_13', 'E_14', 'PX_14', 'PY_14', 'PZ_14', 'E_15', 'PX_15', 'PY_15', 'PZ_15', 'E_16', 'PX_16', 'PY_16', 'PZ_16', 'E_17', 'PX_17', 'PY_17', 'PZ_17', 'E_18', 'PX_18', 'PY_18', 'PZ_18', 'E_19', 'PX_19', 'PY_19', 'PZ_19', 'E_20', 'PX_20', 'PY_20', 'PZ_20', 'E_21', 'PX_21', 'PY_21', 'PZ_21', 'E_22', 'PX_22', 'PY_22', 'PZ_22', 'E_23', 'PX_23', 'PY_23', 'PZ_23', 'E_24', 'PX_24', 'PY_24', 'PZ_24', 'E_25', 'PX_25', 'PY_25', 'PZ_25', 'E_26', 'PX_26', 'PY_26', 'PZ_26', 'E_27', 'PX_27', 'PY_27', 'PZ_27', 'E_28', 'PX_28', 'PY_28', 'PZ_28', 'E_29', 'PX_29', 'PY_29', 'PZ_29', 'E_30', 'PX_30', 'PY_30', 'PZ_30', 'E_31', 'PX_31', 'PY_31', 'PZ_31', 'E_32', 'PX_32', 'PY_32', 'PZ_32', 'E_33', 'PX_33', 'PY_33', 'PZ_33', 'E_34', 'PX_34', 'PY_34', 'PZ_34', 'E_35', 'PX_35', 'PY_35', 'PZ_35', 'E_36', 'PX_36', 'PY_36', 'PZ_36', 'E_37', 'PX_37', 'PY_37', 'PZ_37', 'E_38', 'PX_38', 'PY_38', 'PZ_38', 'E_39', 'PX_39', 'PY_39', 'PZ_39', 'E_40', 'PX_40', 'PY_40', 'PZ_40', 'E_41', 'PX_41', 'PY_41', 'PZ_41', 'E_42', 'PX_42', 'PY_42', 'PZ_42', 'E_43', 'PX_43', 'PY_43', 'PZ_43', 'E_44', 'PX_44', 'PY_44', 'PZ_44', 'E_45', 'PX_45', 'PY_45', 'PZ_45', 'E_46', 'PX_46', 'PY_46', 'PZ_46', 'E_47', 'PX_47', 'PY_47', 'PZ_47', 'E_48', 'PX_48', 'PY_48', 'PZ_48', 'E_49', 'PX_49', 'PY_49', 'PZ_49', 'E_50', 'PX_50', 'PY_50', 'PZ_50', 'E_51', 'PX_51', 'PY_51', 'PZ_51', 'E_52', 'PX_52', 'PY_52', 'PZ_52', 'E_53', 'PX_53', 'PY_53', 'PZ_53', 'E_54', 'PX_54', 'PY_54', 'PZ_54', 'E_55', 'PX_55', 'PY_55', 'PZ_55', 'E_56', 'PX_56', 'PY_56', 'PZ_56', 'E_57', 'PX_57', 'PY_57', 'PZ_57', 'E_58', 'PX_58', 'PY_58', 'PZ_58', 'E_59', 'PX_59', 'PY_59', 'PZ_59', 'E_60', 'PX_60', 'PY_60', 'PZ_60', 'E_61', 'PX_61', 'PY_61', 'PZ_61', 'E_62', 'PX_62', 'PY_62', 'PZ_62', 'E_63', 'PX_63', 'PY_63', 'PZ_63', 'E_64', 'PX_64', 'PY_64', 'PZ_64', 'E_65', 'PX_65', 'PY_65', 'PZ_65', 'E_66', 'PX_66', 'PY_66', 'PZ_66', 'E_67', 'PX_67', 'PY_67', 'PZ_67', 'E_68', 'PX_68', 'PY_68', 'PZ_68', 'E_69', 'PX_69', 'PY_69', 'PZ_69', 'E_70', 'PX_70', 'PY_70', 'PZ_70', 'E_71', 'PX_71', 'PY_71', 'PZ_71', 'E_72', 'PX_72', 'PY_72', 'PZ_72', 'E_73', 'PX_73', 'PY_73', 'PZ_73', 'E_74', 'PX_74', 'PY_74', 'PZ_74', 'E_75', 'PX_75', 'PY_75', 'PZ_75', 'E_76', 'PX_76', 'PY_76', 'PZ_76', 'E_77', 'PX_77', 'PY_77', 'PZ_77', 'E_78', 'PX_78', 'PY_78', 'PZ_78', 'E_79', 'PX_79', 'PY_79', 'PZ_79', 'E_80', 'PX_80', 'PY_80', 'PZ_80', 'E_81', 'PX_81', 'PY_81', 'PZ_81', 'E_82', 'PX_82', 'PY_82', 'PZ_82', 'E_83', 'PX_83', 'PY_83', 'PZ_83', 'E_84', 'PX_84', 'PY_84', 'PZ_84', 'E_85', 'PX_85', 'PY_85', 'PZ_85', 'E_86', 'PX_86', 'PY_86', 'PZ_86', 'E_87', 'PX_87', 'PY_87', 'PZ_87', 'E_88', 'PX_88', 'PY_88', 'PZ_88', 'E_89', 'PX_89', 'PY_89', 'PZ_89', 'E_90', 'PX_90', 'PY_90', 'PZ_90', 'E_91', 'PX_91', 'PY_91', 'PZ_91', 'E_92', 'PX_92', 'PY_92', 'PZ_92', 'E_93', 'PX_93', 'PY_93', 'PZ_93', 'E_94', 'PX_94', 'PY_94', 'PZ_94', 'E_95', 'PX_95', 'PY_95', 'PZ_95', 'E_96', 'PX_96', 'PY_96', 'PZ_96', 'E_97', 'PX_97', 'PY_97', 'PZ_97', 'E_98', 'PX_98', 'PY_98', 'PZ_98', 'E_99', 'PX_99', 'PY_99', 'PZ_99', 'E_100', 'PX_100', 'PY_100', 'PZ_100', 'E_101', 'PX_101', 'PY_101', 'PZ_101', 'E_102', 'PX_102', 'PY_102', 'PZ_102', 'E_103', 'PX_103', 'PY_103', 'PZ_103', 'E_104', 'PX_104', 'PY_104', 'PZ_104', 'E_105', 'PX_105', 'PY_105', 'PZ_105', 'E_106', 'PX_106', 'PY_106', 'PZ_106', 'E_107', 'PX_107', 'PY_107', 'PZ_107', 'E_108', 'PX_108', 'PY_108', 'PZ_108', 'E_109', 'PX_109', 'PY_109', 'PZ_109', 'E_110', 'PX_110', 'PY_110', 'PZ_110', 'E_111', 'PX_111', 'PY_111', 'PZ_111', 'E_112', 'PX_112', 'PY_112', 'PZ_112', 'E_113', 'PX_113', 'PY_113', 'PZ_113', 'E_114', 'PX_114', 'PY_114', 'PZ_114', 'E_115', 'PX_115', 'PY_115', 'PZ_115', 'E_116', 'PX_116', 'PY_116', 'PZ_116', 'E_117', 'PX_117', 'PY_117', 'PZ_117', 'E_118', 'PX_118', 'PY_118', 'PZ_118', 'E_119', 'PX_119', 'PY_119', 'PZ_119', 'E_120', 'PX_120', 'PY_120', 'PZ_120', 'E_121', 'PX_121', 'PY_121', 'PZ_121', 'E_122', 'PX_122', 'PY_122', 'PZ_122', 'E_123', 'PX_123', 'PY_123', 'PZ_123', 'E_124', 'PX_124', 'PY_124', 'PZ_124', 'E_125', 'PX_125', 'PY_125', 'PZ_125', 'E_126', 'PX_126', 'PY_126', 'PZ_126', 'E_127', 'PX_127', 'PY_127', 'PZ_127', 'E_128', 'PX_128', 'PY_128', 'PZ_128', 'E_129', 'PX_129', 'PY_129', 'PZ_129', 'E_130', 'PX_130', 'PY_130', 'PZ_130', 'E_131', 'PX_131', 'PY_131', 'PZ_131', 'E_132', 'PX_132', 'PY_132', 'PZ_132', 'E_133', 'PX_133', 'PY_133', 'PZ_133', 'E_134', 'PX_134', 'PY_134', 'PZ_134', 'E_135', 'PX_135', 'PY_135', 'PZ_135', 'E_136', 'PX_136', 'PY_136', 'PZ_136', 'E_137', 'PX_137', 'PY_137', 'PZ_137', 'E_138', 'PX_138', 'PY_138', 'PZ_138', 'E_139', 'PX_139', 'PY_139', 'PZ_139', 'E_140', 'PX_140', 'PY_140', 'PZ_140', 'E_141', 'PX_141', 'PY_141', 'PZ_141', 'E_142', 'PX_142', 'PY_142', 'PZ_142', 'E_143', 'PX_143', 'PY_143', 'PZ_143', 'E_144', 'PX_144', 'PY_144', 'PZ_144', 'E_145', 'PX_145', 'PY_145', 'PZ_145', 'E_146', 'PX_146', 'PY_146', 'PZ_146', 'E_147', 'PX_147', 'PY_147', 'PZ_147', 'E_148', 'PX_148', 'PY_148', 'PZ_148', 'E_149', 'PX_149', 'PY_149', 'PZ_149', 'E_150', 'PX_150', 'PY_150', 'PZ_150', 'E_151', 'PX_151', 'PY_151', 'PZ_151', 'E_152', 'PX_152', 'PY_152', 'PZ_152', 'E_153', 'PX_153', 'PY_153', 'PZ_153', 'E_154', 'PX_154', 'PY_154', 'PZ_154', 'E_155', 'PX_155', 'PY_155', 'PZ_155', 'E_156', 'PX_156', 'PY_156', 'PZ_156', 'E_157', 'PX_157', 'PY_157', 'PZ_157', 'E_158', 'PX_158', 'PY_158', 'PZ_158', 'E_159', 'PX_159', 'PY_159', 'PZ_159', 'E_160', 'PX_160', 'PY_160', 'PZ_160', 'E_161', 'PX_161', 'PY_161', 'PZ_161', 'E_162', 'PX_162', 'PY_162', 'PZ_162', 'E_163', 'PX_163', 'PY_163', 'PZ_163', 'E_164', 'PX_164', 'PY_164', 'PZ_164', 'E_165', 'PX_165', 'PY_165', 'PZ_165', 'E_166', 'PX_166', 'PY_166', 'PZ_166', 'E_167', 'PX_167', 'PY_167', 'PZ_167', 'E_168', 'PX_168', 'PY_168', 'PZ_168', 'E_169', 'PX_169', 'PY_169', 'PZ_169', 'E_170', 'PX_170', 'PY_170', 'PZ_170', 'E_171', 'PX_171', 'PY_171', 'PZ_171', 'E_172', 'PX_172', 'PY_172', 'PZ_172', 'E_173', 'PX_173', 'PY_173', 'PZ_173', 'E_174', 'PX_174', 'PY_174', 'PZ_174', 'E_175', 'PX_175', 'PY_175', 'PZ_175', 'E_176', 'PX_176', 'PY_176', 'PZ_176', 'E_177', 'PX_177', 'PY_177', 'PZ_177', 'E_178', 'PX_178', 'PY_178', 'PZ_178', 'E_179', 'PX_179', 'PY_179', 'PZ_179', 'E_180', 'PX_180', 'PY_180', 'PZ_180', 'E_181', 'PX_181', 'PY_181', 'PZ_181', 'E_182', 'PX_182', 'PY_182', 'PZ_182', 'E_183', 'PX_183', 'PY_183', 'PZ_183', 'E_184', 'PX_184', 'PY_184', 'PZ_184', 'E_185', 'PX_185', 'PY_185', 'PZ_185', 'E_186', 'PX_186', 'PY_186', 'PZ_186', 'E_187', 'PX_187', 'PY_187', 'PZ_187', 'E_188', 'PX_188', 'PY_188', 'PZ_188', 'E_189', 'PX_189', 'PY_189', 'PZ_189', 'E_190', 'PX_190', 'PY_190', 'PZ_190', 'E_191', 'PX_191', 'PY_191', 'PZ_191', 'E_192', 'PX_192', 'PY_192', 'PZ_192', 'E_193', 'PX_193', 'PY_193', 'PZ_193', 'E_194', 'PX_194', 'PY_194', 'PZ_194', 'E_195', 'PX_195', 'PY_195', 'PZ_195', 'E_196', 'PX_196', 'PY_196', 'PZ_196', 'E_197', 'PX_197', 'PY_197', 'PZ_197', 'E_198', 'PX_198', 'PY_198', 'PZ_198', 'E_199', 'PX_199', 'PY_199', 'PZ_199', 'truthE', 'truthPX', 'truthPY', 'truthPZ', 'ttv', 'is_signal_new'],
num_rows: 404000
})
validation: Dataset({
features: ['E_0', 'PX_0', 'PY_0', 'PZ_0', 'E_1', 'PX_1', 'PY_1', 'PZ_1', 'E_2', 'PX_2', 'PY_2', 'PZ_2', 'E_3', 'PX_3', 'PY_3', 'PZ_3', 'E_4', 'PX_4', 'PY_4', 'PZ_4', 'E_5', 'PX_5', 'PY_5', 'PZ_5', 'E_6', 'PX_6', 'PY_6', 'PZ_6', 'E_7', 'PX_7', 'PY_7', 'PZ_7', 'E_8', 'PX_8', 'PY_8', 'PZ_8', 'E_9', 'PX_9', 'PY_9', 'PZ_9', 'E_10', 'PX_10', 'PY_10', 'PZ_10', 'E_11', 'PX_11', 'PY_11', 'PZ_11', 'E_12', 'PX_12', 'PY_12', 'PZ_12', 'E_13', 'PX_13', 'PY_13', 'PZ_13', 'E_14', 'PX_14', 'PY_14', 'PZ_14', 'E_15', 'PX_15', 'PY_15', 'PZ_15', 'E_16', 'PX_16', 'PY_16', 'PZ_16', 'E_17', 'PX_17', 'PY_17', 'PZ_17', 'E_18', 'PX_18', 'PY_18', 'PZ_18', 'E_19', 'PX_19', 'PY_19', 'PZ_19', 'E_20', 'PX_20', 'PY_20', 'PZ_20', 'E_21', 'PX_21', 'PY_21', 'PZ_21', 'E_22', 'PX_22', 'PY_22', 'PZ_22', 'E_23', 'PX_23', 'PY_23', 'PZ_23', 'E_24', 'PX_24', 'PY_24', 'PZ_24', 'E_25', 'PX_25', 'PY_25', 'PZ_25', 'E_26', 'PX_26', 'PY_26', 'PZ_26', 'E_27', 'PX_27', 'PY_27', 'PZ_27', 'E_28', 'PX_28', 'PY_28', 'PZ_28', 'E_29', 'PX_29', 'PY_29', 'PZ_29', 'E_30', 'PX_30', 'PY_30', 'PZ_30', 'E_31', 'PX_31', 'PY_31', 'PZ_31', 'E_32', 'PX_32', 'PY_32', 'PZ_32', 'E_33', 'PX_33', 'PY_33', 'PZ_33', 'E_34', 'PX_34', 'PY_34', 'PZ_34', 'E_35', 'PX_35', 'PY_35', 'PZ_35', 'E_36', 'PX_36', 'PY_36', 'PZ_36', 'E_37', 'PX_37', 'PY_37', 'PZ_37', 'E_38', 'PX_38', 'PY_38', 'PZ_38', 'E_39', 'PX_39', 'PY_39', 'PZ_39', 'E_40', 'PX_40', 'PY_40', 'PZ_40', 'E_41', 'PX_41', 'PY_41', 'PZ_41', 'E_42', 'PX_42', 'PY_42', 'PZ_42', 'E_43', 'PX_43', 'PY_43', 'PZ_43', 'E_44', 'PX_44', 'PY_44', 'PZ_44', 'E_45', 'PX_45', 'PY_45', 'PZ_45', 'E_46', 'PX_46', 'PY_46', 'PZ_46', 'E_47', 'PX_47', 'PY_47', 'PZ_47', 'E_48', 'PX_48', 'PY_48', 'PZ_48', 'E_49', 'PX_49', 'PY_49', 'PZ_49', 'E_50', 'PX_50', 'PY_50', 'PZ_50', 'E_51', 'PX_51', 'PY_51', 'PZ_51', 'E_52', 'PX_52', 'PY_52', 'PZ_52', 'E_53', 'PX_53', 'PY_53', 'PZ_53', 'E_54', 'PX_54', 'PY_54', 'PZ_54', 'E_55', 'PX_55', 'PY_55', 'PZ_55', 'E_56', 'PX_56', 'PY_56', 'PZ_56', 'E_57', 'PX_57', 'PY_57', 'PZ_57', 'E_58', 'PX_58', 'PY_58', 'PZ_58', 'E_59', 'PX_59', 'PY_59', 'PZ_59', 'E_60', 'PX_60', 'PY_60', 'PZ_60', 'E_61', 'PX_61', 'PY_61', 'PZ_61', 'E_62', 'PX_62', 'PY_62', 'PZ_62', 'E_63', 'PX_63', 'PY_63', 'PZ_63', 'E_64', 'PX_64', 'PY_64', 'PZ_64', 'E_65', 'PX_65', 'PY_65', 'PZ_65', 'E_66', 'PX_66', 'PY_66', 'PZ_66', 'E_67', 'PX_67', 'PY_67', 'PZ_67', 'E_68', 'PX_68', 'PY_68', 'PZ_68', 'E_69', 'PX_69', 'PY_69', 'PZ_69', 'E_70', 'PX_70', 'PY_70', 'PZ_70', 'E_71', 'PX_71', 'PY_71', 'PZ_71', 'E_72', 'PX_72', 'PY_72', 'PZ_72', 'E_73', 'PX_73', 'PY_73', 'PZ_73', 'E_74', 'PX_74', 'PY_74', 'PZ_74', 'E_75', 'PX_75', 'PY_75', 'PZ_75', 'E_76', 'PX_76', 'PY_76', 'PZ_76', 'E_77', 'PX_77', 'PY_77', 'PZ_77', 'E_78', 'PX_78', 'PY_78', 'PZ_78', 'E_79', 'PX_79', 'PY_79', 'PZ_79', 'E_80', 'PX_80', 'PY_80', 'PZ_80', 'E_81', 'PX_81', 'PY_81', 'PZ_81', 'E_82', 'PX_82', 'PY_82', 'PZ_82', 'E_83', 'PX_83', 'PY_83', 'PZ_83', 'E_84', 'PX_84', 'PY_84', 'PZ_84', 'E_85', 'PX_85', 'PY_85', 'PZ_85', 'E_86', 'PX_86', 'PY_86', 'PZ_86', 'E_87', 'PX_87', 'PY_87', 'PZ_87', 'E_88', 'PX_88', 'PY_88', 'PZ_88', 'E_89', 'PX_89', 'PY_89', 'PZ_89', 'E_90', 'PX_90', 'PY_90', 'PZ_90', 'E_91', 'PX_91', 'PY_91', 'PZ_91', 'E_92', 'PX_92', 'PY_92', 'PZ_92', 'E_93', 'PX_93', 'PY_93', 'PZ_93', 'E_94', 'PX_94', 'PY_94', 'PZ_94', 'E_95', 'PX_95', 'PY_95', 'PZ_95', 'E_96', 'PX_96', 'PY_96', 'PZ_96', 'E_97', 'PX_97', 'PY_97', 'PZ_97', 'E_98', 'PX_98', 'PY_98', 'PZ_98', 'E_99', 'PX_99', 'PY_99', 'PZ_99', 'E_100', 'PX_100', 'PY_100', 'PZ_100', 'E_101', 'PX_101', 'PY_101', 'PZ_101', 'E_102', 'PX_102', 'PY_102', 'PZ_102', 'E_103', 'PX_103', 'PY_103', 'PZ_103', 'E_104', 'PX_104', 'PY_104', 'PZ_104', 'E_105', 'PX_105', 'PY_105', 'PZ_105', 'E_106', 'PX_106', 'PY_106', 'PZ_106', 'E_107', 'PX_107', 'PY_107', 'PZ_107', 'E_108', 'PX_108', 'PY_108', 'PZ_108', 'E_109', 'PX_109', 'PY_109', 'PZ_109', 'E_110', 'PX_110', 'PY_110', 'PZ_110', 'E_111', 'PX_111', 'PY_111', 'PZ_111', 'E_112', 'PX_112', 'PY_112', 'PZ_112', 'E_113', 'PX_113', 'PY_113', 'PZ_113', 'E_114', 'PX_114', 'PY_114', 'PZ_114', 'E_115', 'PX_115', 'PY_115', 'PZ_115', 'E_116', 'PX_116', 'PY_116', 'PZ_116', 'E_117', 'PX_117', 'PY_117', 'PZ_117', 'E_118', 'PX_118', 'PY_118', 'PZ_118', 'E_119', 'PX_119', 'PY_119', 'PZ_119', 'E_120', 'PX_120', 'PY_120', 'PZ_120', 'E_121', 'PX_121', 'PY_121', 'PZ_121', 'E_122', 'PX_122', 'PY_122', 'PZ_122', 'E_123', 'PX_123', 'PY_123', 'PZ_123', 'E_124', 'PX_124', 'PY_124', 'PZ_124', 'E_125', 'PX_125', 'PY_125', 'PZ_125', 'E_126', 'PX_126', 'PY_126', 'PZ_126', 'E_127', 'PX_127', 'PY_127', 'PZ_127', 'E_128', 'PX_128', 'PY_128', 'PZ_128', 'E_129', 'PX_129', 'PY_129', 'PZ_129', 'E_130', 'PX_130', 'PY_130', 'PZ_130', 'E_131', 'PX_131', 'PY_131', 'PZ_131', 'E_132', 'PX_132', 'PY_132', 'PZ_132', 'E_133', 'PX_133', 'PY_133', 'PZ_133', 'E_134', 'PX_134', 'PY_134', 'PZ_134', 'E_135', 'PX_135', 'PY_135', 'PZ_135', 'E_136', 'PX_136', 'PY_136', 'PZ_136', 'E_137', 'PX_137', 'PY_137', 'PZ_137', 'E_138', 'PX_138', 'PY_138', 'PZ_138', 'E_139', 'PX_139', 'PY_139', 'PZ_139', 'E_140', 'PX_140', 'PY_140', 'PZ_140', 'E_141', 'PX_141', 'PY_141', 'PZ_141', 'E_142', 'PX_142', 'PY_142', 'PZ_142', 'E_143', 'PX_143', 'PY_143', 'PZ_143', 'E_144', 'PX_144', 'PY_144', 'PZ_144', 'E_145', 'PX_145', 'PY_145', 'PZ_145', 'E_146', 'PX_146', 'PY_146', 'PZ_146', 'E_147', 'PX_147', 'PY_147', 'PZ_147', 'E_148', 'PX_148', 'PY_148', 'PZ_148', 'E_149', 'PX_149', 'PY_149', 'PZ_149', 'E_150', 'PX_150', 'PY_150', 'PZ_150', 'E_151', 'PX_151', 'PY_151', 'PZ_151', 'E_152', 'PX_152', 'PY_152', 'PZ_152', 'E_153', 'PX_153', 'PY_153', 'PZ_153', 'E_154', 'PX_154', 'PY_154', 'PZ_154', 'E_155', 'PX_155', 'PY_155', 'PZ_155', 'E_156', 'PX_156', 'PY_156', 'PZ_156', 'E_157', 'PX_157', 'PY_157', 'PZ_157', 'E_158', 'PX_158', 'PY_158', 'PZ_158', 'E_159', 'PX_159', 'PY_159', 'PZ_159', 'E_160', 'PX_160', 'PY_160', 'PZ_160', 'E_161', 'PX_161', 'PY_161', 'PZ_161', 'E_162', 'PX_162', 'PY_162', 'PZ_162', 'E_163', 'PX_163', 'PY_163', 'PZ_163', 'E_164', 'PX_164', 'PY_164', 'PZ_164', 'E_165', 'PX_165', 'PY_165', 'PZ_165', 'E_166', 'PX_166', 'PY_166', 'PZ_166', 'E_167', 'PX_167', 'PY_167', 'PZ_167', 'E_168', 'PX_168', 'PY_168', 'PZ_168', 'E_169', 'PX_169', 'PY_169', 'PZ_169', 'E_170', 'PX_170', 'PY_170', 'PZ_170', 'E_171', 'PX_171', 'PY_171', 'PZ_171', 'E_172', 'PX_172', 'PY_172', 'PZ_172', 'E_173', 'PX_173', 'PY_173', 'PZ_173', 'E_174', 'PX_174', 'PY_174', 'PZ_174', 'E_175', 'PX_175', 'PY_175', 'PZ_175', 'E_176', 'PX_176', 'PY_176', 'PZ_176', 'E_177', 'PX_177', 'PY_177', 'PZ_177', 'E_178', 'PX_178', 'PY_178', 'PZ_178', 'E_179', 'PX_179', 'PY_179', 'PZ_179', 'E_180', 'PX_180', 'PY_180', 'PZ_180', 'E_181', 'PX_181', 'PY_181', 'PZ_181', 'E_182', 'PX_182', 'PY_182', 'PZ_182', 'E_183', 'PX_183', 'PY_183', 'PZ_183', 'E_184', 'PX_184', 'PY_184', 'PZ_184', 'E_185', 'PX_185', 'PY_185', 'PZ_185', 'E_186', 'PX_186', 'PY_186', 'PZ_186', 'E_187', 'PX_187', 'PY_187', 'PZ_187', 'E_188', 'PX_188', 'PY_188', 'PZ_188', 'E_189', 'PX_189', 'PY_189', 'PZ_189', 'E_190', 'PX_190', 'PY_190', 'PZ_190', 'E_191', 'PX_191', 'PY_191', 'PZ_191', 'E_192', 'PX_192', 'PY_192', 'PZ_192', 'E_193', 'PX_193', 'PY_193', 'PZ_193', 'E_194', 'PX_194', 'PY_194', 'PZ_194', 'E_195', 'PX_195', 'PY_195', 'PZ_195', 'E_196', 'PX_196', 'PY_196', 'PZ_196', 'E_197', 'PX_197', 'PY_197', 'PZ_197', 'E_198', 'PX_198', 'PY_198', 'PZ_198', 'E_199', 'PX_199', 'PY_199', 'PZ_199', 'truthE', 'truthPX', 'truthPY', 'truthPZ', 'ttv', 'is_signal_new'],
num_rows: 403000
})
})
[4]:
top_tagging_ds["train"]
[4]:
Dataset({
features: ['E_0', 'PX_0', 'PY_0', 'PZ_0', 'E_1', 'PX_1', 'PY_1', 'PZ_1', 'E_2', 'PX_2', 'PY_2', 'PZ_2', 'E_3', 'PX_3', 'PY_3', 'PZ_3', 'E_4', 'PX_4', 'PY_4', 'PZ_4', 'E_5', 'PX_5', 'PY_5', 'PZ_5', 'E_6', 'PX_6', 'PY_6', 'PZ_6', 'E_7', 'PX_7', 'PY_7', 'PZ_7', 'E_8', 'PX_8', 'PY_8', 'PZ_8', 'E_9', 'PX_9', 'PY_9', 'PZ_9', 'E_10', 'PX_10', 'PY_10', 'PZ_10', 'E_11', 'PX_11', 'PY_11', 'PZ_11', 'E_12', 'PX_12', 'PY_12', 'PZ_12', 'E_13', 'PX_13', 'PY_13', 'PZ_13', 'E_14', 'PX_14', 'PY_14', 'PZ_14', 'E_15', 'PX_15', 'PY_15', 'PZ_15', 'E_16', 'PX_16', 'PY_16', 'PZ_16', 'E_17', 'PX_17', 'PY_17', 'PZ_17', 'E_18', 'PX_18', 'PY_18', 'PZ_18', 'E_19', 'PX_19', 'PY_19', 'PZ_19', 'E_20', 'PX_20', 'PY_20', 'PZ_20', 'E_21', 'PX_21', 'PY_21', 'PZ_21', 'E_22', 'PX_22', 'PY_22', 'PZ_22', 'E_23', 'PX_23', 'PY_23', 'PZ_23', 'E_24', 'PX_24', 'PY_24', 'PZ_24', 'E_25', 'PX_25', 'PY_25', 'PZ_25', 'E_26', 'PX_26', 'PY_26', 'PZ_26', 'E_27', 'PX_27', 'PY_27', 'PZ_27', 'E_28', 'PX_28', 'PY_28', 'PZ_28', 'E_29', 'PX_29', 'PY_29', 'PZ_29', 'E_30', 'PX_30', 'PY_30', 'PZ_30', 'E_31', 'PX_31', 'PY_31', 'PZ_31', 'E_32', 'PX_32', 'PY_32', 'PZ_32', 'E_33', 'PX_33', 'PY_33', 'PZ_33', 'E_34', 'PX_34', 'PY_34', 'PZ_34', 'E_35', 'PX_35', 'PY_35', 'PZ_35', 'E_36', 'PX_36', 'PY_36', 'PZ_36', 'E_37', 'PX_37', 'PY_37', 'PZ_37', 'E_38', 'PX_38', 'PY_38', 'PZ_38', 'E_39', 'PX_39', 'PY_39', 'PZ_39', 'E_40', 'PX_40', 'PY_40', 'PZ_40', 'E_41', 'PX_41', 'PY_41', 'PZ_41', 'E_42', 'PX_42', 'PY_42', 'PZ_42', 'E_43', 'PX_43', 'PY_43', 'PZ_43', 'E_44', 'PX_44', 'PY_44', 'PZ_44', 'E_45', 'PX_45', 'PY_45', 'PZ_45', 'E_46', 'PX_46', 'PY_46', 'PZ_46', 'E_47', 'PX_47', 'PY_47', 'PZ_47', 'E_48', 'PX_48', 'PY_48', 'PZ_48', 'E_49', 'PX_49', 'PY_49', 'PZ_49', 'E_50', 'PX_50', 'PY_50', 'PZ_50', 'E_51', 'PX_51', 'PY_51', 'PZ_51', 'E_52', 'PX_52', 'PY_52', 'PZ_52', 'E_53', 'PX_53', 'PY_53', 'PZ_53', 'E_54', 'PX_54', 'PY_54', 'PZ_54', 'E_55', 'PX_55', 'PY_55', 'PZ_55', 'E_56', 'PX_56', 'PY_56', 'PZ_56', 'E_57', 'PX_57', 'PY_57', 'PZ_57', 'E_58', 'PX_58', 'PY_58', 'PZ_58', 'E_59', 'PX_59', 'PY_59', 'PZ_59', 'E_60', 'PX_60', 'PY_60', 'PZ_60', 'E_61', 'PX_61', 'PY_61', 'PZ_61', 'E_62', 'PX_62', 'PY_62', 'PZ_62', 'E_63', 'PX_63', 'PY_63', 'PZ_63', 'E_64', 'PX_64', 'PY_64', 'PZ_64', 'E_65', 'PX_65', 'PY_65', 'PZ_65', 'E_66', 'PX_66', 'PY_66', 'PZ_66', 'E_67', 'PX_67', 'PY_67', 'PZ_67', 'E_68', 'PX_68', 'PY_68', 'PZ_68', 'E_69', 'PX_69', 'PY_69', 'PZ_69', 'E_70', 'PX_70', 'PY_70', 'PZ_70', 'E_71', 'PX_71', 'PY_71', 'PZ_71', 'E_72', 'PX_72', 'PY_72', 'PZ_72', 'E_73', 'PX_73', 'PY_73', 'PZ_73', 'E_74', 'PX_74', 'PY_74', 'PZ_74', 'E_75', 'PX_75', 'PY_75', 'PZ_75', 'E_76', 'PX_76', 'PY_76', 'PZ_76', 'E_77', 'PX_77', 'PY_77', 'PZ_77', 'E_78', 'PX_78', 'PY_78', 'PZ_78', 'E_79', 'PX_79', 'PY_79', 'PZ_79', 'E_80', 'PX_80', 'PY_80', 'PZ_80', 'E_81', 'PX_81', 'PY_81', 'PZ_81', 'E_82', 'PX_82', 'PY_82', 'PZ_82', 'E_83', 'PX_83', 'PY_83', 'PZ_83', 'E_84', 'PX_84', 'PY_84', 'PZ_84', 'E_85', 'PX_85', 'PY_85', 'PZ_85', 'E_86', 'PX_86', 'PY_86', 'PZ_86', 'E_87', 'PX_87', 'PY_87', 'PZ_87', 'E_88', 'PX_88', 'PY_88', 'PZ_88', 'E_89', 'PX_89', 'PY_89', 'PZ_89', 'E_90', 'PX_90', 'PY_90', 'PZ_90', 'E_91', 'PX_91', 'PY_91', 'PZ_91', 'E_92', 'PX_92', 'PY_92', 'PZ_92', 'E_93', 'PX_93', 'PY_93', 'PZ_93', 'E_94', 'PX_94', 'PY_94', 'PZ_94', 'E_95', 'PX_95', 'PY_95', 'PZ_95', 'E_96', 'PX_96', 'PY_96', 'PZ_96', 'E_97', 'PX_97', 'PY_97', 'PZ_97', 'E_98', 'PX_98', 'PY_98', 'PZ_98', 'E_99', 'PX_99', 'PY_99', 'PZ_99', 'E_100', 'PX_100', 'PY_100', 'PZ_100', 'E_101', 'PX_101', 'PY_101', 'PZ_101', 'E_102', 'PX_102', 'PY_102', 'PZ_102', 'E_103', 'PX_103', 'PY_103', 'PZ_103', 'E_104', 'PX_104', 'PY_104', 'PZ_104', 'E_105', 'PX_105', 'PY_105', 'PZ_105', 'E_106', 'PX_106', 'PY_106', 'PZ_106', 'E_107', 'PX_107', 'PY_107', 'PZ_107', 'E_108', 'PX_108', 'PY_108', 'PZ_108', 'E_109', 'PX_109', 'PY_109', 'PZ_109', 'E_110', 'PX_110', 'PY_110', 'PZ_110', 'E_111', 'PX_111', 'PY_111', 'PZ_111', 'E_112', 'PX_112', 'PY_112', 'PZ_112', 'E_113', 'PX_113', 'PY_113', 'PZ_113', 'E_114', 'PX_114', 'PY_114', 'PZ_114', 'E_115', 'PX_115', 'PY_115', 'PZ_115', 'E_116', 'PX_116', 'PY_116', 'PZ_116', 'E_117', 'PX_117', 'PY_117', 'PZ_117', 'E_118', 'PX_118', 'PY_118', 'PZ_118', 'E_119', 'PX_119', 'PY_119', 'PZ_119', 'E_120', 'PX_120', 'PY_120', 'PZ_120', 'E_121', 'PX_121', 'PY_121', 'PZ_121', 'E_122', 'PX_122', 'PY_122', 'PZ_122', 'E_123', 'PX_123', 'PY_123', 'PZ_123', 'E_124', 'PX_124', 'PY_124', 'PZ_124', 'E_125', 'PX_125', 'PY_125', 'PZ_125', 'E_126', 'PX_126', 'PY_126', 'PZ_126', 'E_127', 'PX_127', 'PY_127', 'PZ_127', 'E_128', 'PX_128', 'PY_128', 'PZ_128', 'E_129', 'PX_129', 'PY_129', 'PZ_129', 'E_130', 'PX_130', 'PY_130', 'PZ_130', 'E_131', 'PX_131', 'PY_131', 'PZ_131', 'E_132', 'PX_132', 'PY_132', 'PZ_132', 'E_133', 'PX_133', 'PY_133', 'PZ_133', 'E_134', 'PX_134', 'PY_134', 'PZ_134', 'E_135', 'PX_135', 'PY_135', 'PZ_135', 'E_136', 'PX_136', 'PY_136', 'PZ_136', 'E_137', 'PX_137', 'PY_137', 'PZ_137', 'E_138', 'PX_138', 'PY_138', 'PZ_138', 'E_139', 'PX_139', 'PY_139', 'PZ_139', 'E_140', 'PX_140', 'PY_140', 'PZ_140', 'E_141', 'PX_141', 'PY_141', 'PZ_141', 'E_142', 'PX_142', 'PY_142', 'PZ_142', 'E_143', 'PX_143', 'PY_143', 'PZ_143', 'E_144', 'PX_144', 'PY_144', 'PZ_144', 'E_145', 'PX_145', 'PY_145', 'PZ_145', 'E_146', 'PX_146', 'PY_146', 'PZ_146', 'E_147', 'PX_147', 'PY_147', 'PZ_147', 'E_148', 'PX_148', 'PY_148', 'PZ_148', 'E_149', 'PX_149', 'PY_149', 'PZ_149', 'E_150', 'PX_150', 'PY_150', 'PZ_150', 'E_151', 'PX_151', 'PY_151', 'PZ_151', 'E_152', 'PX_152', 'PY_152', 'PZ_152', 'E_153', 'PX_153', 'PY_153', 'PZ_153', 'E_154', 'PX_154', 'PY_154', 'PZ_154', 'E_155', 'PX_155', 'PY_155', 'PZ_155', 'E_156', 'PX_156', 'PY_156', 'PZ_156', 'E_157', 'PX_157', 'PY_157', 'PZ_157', 'E_158', 'PX_158', 'PY_158', 'PZ_158', 'E_159', 'PX_159', 'PY_159', 'PZ_159', 'E_160', 'PX_160', 'PY_160', 'PZ_160', 'E_161', 'PX_161', 'PY_161', 'PZ_161', 'E_162', 'PX_162', 'PY_162', 'PZ_162', 'E_163', 'PX_163', 'PY_163', 'PZ_163', 'E_164', 'PX_164', 'PY_164', 'PZ_164', 'E_165', 'PX_165', 'PY_165', 'PZ_165', 'E_166', 'PX_166', 'PY_166', 'PZ_166', 'E_167', 'PX_167', 'PY_167', 'PZ_167', 'E_168', 'PX_168', 'PY_168', 'PZ_168', 'E_169', 'PX_169', 'PY_169', 'PZ_169', 'E_170', 'PX_170', 'PY_170', 'PZ_170', 'E_171', 'PX_171', 'PY_171', 'PZ_171', 'E_172', 'PX_172', 'PY_172', 'PZ_172', 'E_173', 'PX_173', 'PY_173', 'PZ_173', 'E_174', 'PX_174', 'PY_174', 'PZ_174', 'E_175', 'PX_175', 'PY_175', 'PZ_175', 'E_176', 'PX_176', 'PY_176', 'PZ_176', 'E_177', 'PX_177', 'PY_177', 'PZ_177', 'E_178', 'PX_178', 'PY_178', 'PZ_178', 'E_179', 'PX_179', 'PY_179', 'PZ_179', 'E_180', 'PX_180', 'PY_180', 'PZ_180', 'E_181', 'PX_181', 'PY_181', 'PZ_181', 'E_182', 'PX_182', 'PY_182', 'PZ_182', 'E_183', 'PX_183', 'PY_183', 'PZ_183', 'E_184', 'PX_184', 'PY_184', 'PZ_184', 'E_185', 'PX_185', 'PY_185', 'PZ_185', 'E_186', 'PX_186', 'PY_186', 'PZ_186', 'E_187', 'PX_187', 'PY_187', 'PZ_187', 'E_188', 'PX_188', 'PY_188', 'PZ_188', 'E_189', 'PX_189', 'PY_189', 'PZ_189', 'E_190', 'PX_190', 'PY_190', 'PZ_190', 'E_191', 'PX_191', 'PY_191', 'PZ_191', 'E_192', 'PX_192', 'PY_192', 'PZ_192', 'E_193', 'PX_193', 'PY_193', 'PZ_193', 'E_194', 'PX_194', 'PY_194', 'PZ_194', 'E_195', 'PX_195', 'PY_195', 'PZ_195', 'E_196', 'PX_196', 'PY_196', 'PZ_196', 'E_197', 'PX_197', 'PY_197', 'PZ_197', 'E_198', 'PX_198', 'PY_198', 'PZ_198', 'E_199', 'PX_199', 'PY_199', 'PZ_199', 'truthE', 'truthPX', 'truthPY', 'truthPZ', 'ttv', 'is_signal_new'],
num_rows: 1211000
})
[5]:
len(top_tagging_ds["train"])
[5]:
1211000
[6]:
# Remove data irrelevant to our classification task (we are only interested in the 4-vectors and their classification)
top_tagging_ds = top_tagging_ds.remove_columns(
["truthE", "truthPX", "truthPY", "truthPZ", "ttv"]
)
You may have noticed that inspecting the dataset in this form is quite cumbersome. To make life easier, we’re going to load our data into the DataFrame format provided by the pandas library, which is going to allow us to view this dataset in a more legible format.
[7]:
# Convert output format to DataFrames
top_tagging_ds.set_format("pandas")
# Create DataFrames for the training and test splits
train_df, test_df = top_tagging_ds["train"][:], top_tagging_ds["test"][:]
# Peek at first few rows
# train_df.head()
[8]:
train_df.shape
[8]:
(1211000, 801)
Let’s print out the first few rows and columns of this dataset.
[9]:
train_df_subset = train_df.T[0:5].T
test_df_subset = test_df.T[0:5].T
train_df_subset.head()
[9]:
| E_0 | PX_0 | PY_0 | PZ_0 | E_1 | |
|---|---|---|---|---|---|
| 0 | 474.071136 | -250.347031 | -223.651962 | -334.738098 | 103.236237 |
| 1 | 150.504532 | 120.062393 | 76.852005 | -48.274265 | 82.257057 |
| 2 | 251.645386 | 10.427651 | -147.573746 | 203.564880 | 104.147797 |
| 3 | 451.566132 | 129.885437 | -99.066292 | -420.984100 | 208.410919 |
| 4 | 399.093903 | -168.432083 | -47.205597 | -358.717438 | 273.691956 |
FastAI DataLoaders
FastAI is a library for manipulating data and building AI models. You don’t need a thorough understanding of this library to appreciate the use of AI for this particular task, but we highly encourage interested readers to explore further.
In the next cell, we’re going to use a DataLoader to load our dataset to the fastai format. This will do a number of important things for us like batch splitting, an important part of data preprocessing in machine learning tasks.
In essence, our model is going to receive “batches” of training data that have just the right size so that they can fit into our computer memory. It wouldn’t make sense to pass the entire dataset to our model all at once, simply because the dataset might be too large. By creating batches, we can take advantage of parallel processing units (such as those in GPUs) as well as shuffle our data around so that the model needs to learn from sparse samples at each given iteration.
[16]:
frac_of_samples = 1.0
train_df = train_df.sample(int(frac_of_samples * len(train_df)), random_state=40)
features = list(train_df.drop(columns=["is_signal_new"]).columns)
# features = list(train_df.columns)
splits = RandomSplitter(valid_pct=0.20, seed=42)(range_of(train_df))
dls = TabularDataLoaders.from_df(
df=train_df,
cont_names=features,
y_names="is_signal_new",
y_block=CategoryBlock,
splits=splits,
bs=1024,
)
The Deep Neural Network Model
In this section, we define the neural network using the tabular_learner function from FastAI. You can see the individual layers and their sizes below learn.summary().
[17]:
learn = tabular_learner(
dls, layers=[200, 200, 50, 50], metrics=[accuracy, RocAucBinary()]
)
The tabular_learner creates a deep neural network (DNN) with the layers specified according to the function parameters. Here we are building a network with 4 layers with the following number of nodes: 200, 200, 50, 50 respectively. We also specify that with this model, we are interested in its accuracy (i.e. how accurate is it at predicting a given jet represents a top-quark or a quark-gluon) and the metric to measure the accuracy is RocAucBinary. RocAucBinary is used for binary
classification tasks such as this one and it is a measure of sensitivity of the true and false positive rate of prediction. For more information, see the following documentation.
[18]:
learn.summary()
[18]:
TabularModel (Input shape: 1024 x 0)
============================================================================
Layer (type) Output Shape Param # Trainable
============================================================================
1024 x 800
BatchNorm1d 1600 True
____________________________________________________________________________
1024 x 200
Linear 160000 True
ReLU
BatchNorm1d 400 True
Linear 40000 True
ReLU
BatchNorm1d 400 True
____________________________________________________________________________
1024 x 50
Linear 10000 True
ReLU
BatchNorm1d 100 True
Linear 2500 True
ReLU
BatchNorm1d 100 True
____________________________________________________________________________
1024 x 2
Linear 102 True
____________________________________________________________________________
Total params: 215,202
Total trainable params: 215,202
Total non-trainable params: 0
Optimizer used: <function Adam at 0x28766cc10>
Loss function: FlattenedLoss of CrossEntropyLoss()
Callbacks:
- TrainEvalCallback
- Recorder
- ProgressCallback
Finding learning rate
The learning rate sets the relative size of the learning steps that the model is taking during training, i.e. how much are the model’s internal parameters changed every time it gets feedback on its predictions.
[19]:
learn.lr_find()
[19]:
SuggestedLRs(valley=0.0006918309954926372)
Training the model
Here we’re going to run one training cycle which is going to go through all the training data. If you’re running this notebook youself, consider changing the number of epochs or learning rate.
[20]:
learn.fit_one_cycle(n_epoch=3, lr_max=1e-3)
| epoch | train_loss | valid_loss | accuracy | roc_auc_score | time |
|---|---|---|---|---|---|
| 0 | 0.518409 | 0.515188 | 0.732680 | 0.802404 | 00:16 |
| 1 | 0.432351 | 0.417070 | 0.804059 | 0.884536 | 00:16 |
| 2 | 0.389154 | 0.379476 | 0.827023 | 0.904105 | 00:15 |
[17]:
# You can inspect the results by uncommenting the following line
# learn.show_results()
Model Evaluation
Now that we’ve trained our model for one cycle, we can use it to make predictions on data it has not seen before. For this, we’re going to use the test_df object we defined above.
[21]:
test_dl = learn.dls.test_dl(test_items=test_df)
[22]:
# get predictions and targets (the real values) by running the test data through our trained model
preds, targs = learn.get_preds(dl=test_dl)
[23]:
preds[:5], targs[:5]
[23]:
(tensor([[0.9982, 0.0018],
[0.4272, 0.5728],
[0.7789, 0.2211],
[0.3957, 0.6043],
[0.9867, 0.0133]]),
tensor([[0],
[0],
[0],
[0],
[0]], dtype=torch.int8))
As can be seen above, even though we’re looking for binary decisions, our model makes probabilistic estimates about the likelihood of each signal being either a top-quark or a quark-gluon background. To decide whether a given prediction represents a top-quark or a quark-gluon background, we can set a cut-off threshold on the individual probabilities.
Let’s plot the probabilities per events to see how our predictions vary in light of the ground truth classifications.
[24]:
signal_test = preds[:, 1][targs.flatten() == 1].numpy()
background_test = preds[:, 1][targs.flatten() == 0].numpy()
plt.hist(signal_test, histtype="step", bins=20, range=(0, 1), label="Signal")
plt.hist(background_test, histtype="step", bins=20, range=(0, 1), label="Background")
plt.xlabel("Probability")
plt.ylabel("Events/bin")
plt.yscale("log")
plt.xlim(0, 1)
plt.legend(loc="lower right", frameon=False)
plt.show()
The accuracy of a binary classifier can be better understood as the relation between the False Positive classification rate and the True Positive classification rate. This relationship is described by the Reciever Operating Characteristic curve (ROC). With the ROC curve, we can measure the effectiveness of our model by looking at the area under the curve (AUC). Higher values of the AUC mean better prediction capability of our model (1 is the highest).
[25]:
fpr, tpr, thresholds = roc_curve(y_true=targs, y_score=preds[:, 1])
plt.plot(fpr, tpr)
plt.plot([0, 1], [0, 1], ls="--", color="k")
plt.xlabel(r"False positive rate")
plt.ylabel(r"True positive rate")
plt.tight_layout()
[26]:
acc_test = accuracy_score(targs, preds.argmax(dim=-1))
auc_test = auc(fpr, tpr)
print(f"Accuracy: {acc_test:.4f}")
print(f"AUC: {auc_test:.4f}")
Accuracy: 0.8278
AUC: 0.9048
We can see that after one training cycle with 3 epochs, we’ve reached a test accuracy of ~83%. Think about how you can improve on this result in your own experimentation. Some things to experiment with: * Model architecture: try changing the size of each layer and the number of layers in the tabular_learner. What do you observe? Does a bigger model always mean better results? * Fit more cycles: Does training for a longer period of time produce better results?
This concludes our short exploration of the use of Deep Neural Networks in particle physics.
This notebook was adapted based on material from the course Deep Learning for Particle Physicists, avaialable at https://lewtun.github.io/dl4phys/intro.html.
References
If you wish to get an overview of the remaining topics in this course, click the button below.
