Storing Histograms in Hepfiles#

A great example of the use of hepfiles is storing information about many histograms with varying numbers of bins in a hepfile. This tutorial walks through this process.

[1]:
import numpy as np
import matplotlib.pyplot as plt
import hepfile as hf

Generating a Dataset#

We first generate 5 random datasets with varying length. If you are planning to do this, you probably already have your dataset, this is just for an example.

[2]:
# generate some random normally distributed datasets
datasets = []
for i in range(1,6):
    rand = np.random.normal(size=10**i)
    datasets.append(rand)

Preparing the Dataset#

Then, we can calculate/create the following information about each dataset: - bins and edge information from numpy histogram - x-label - y-label - title

You can imagine that each histogram is like an “event” which allows us to pack the bins and edge information in a group called histogram and the rest of the information as singletons. To easily pack the data into a hepfile, we can store all of this information in a list of dictionaries. This will allow us to call hepfile.dict_tools.dictlike_to_hepfile easily.

We also plot the histogram data we calculate so that you can see what the datasets look like and how they differ!

[3]:
data_dicts = []

for rand in datasets:

    i = int(np.log10(len(rand)))

    # append an empty dictionary to store the data about this histogram in
    data_dicts.append({})

    # generate random data and then the histogram bins and edges using numpy
    # vary the number of bins because that's what hepfile is good at!
    bins, edges = np.histogram(rand, bins=int(4**i))

    # plot this data
    plt.figure()
    plt.bar(edges[:-1], bins, align='edge', width=edges[1]-edges[0])

    ylabel = f'y-label {i}'
    plt.ylabel(ylabel)
    data_dicts[-1]['ylabel'] = ylabel # store the ylabel in the recently appended dict

    xlabel = f'x-label {i}'
    plt.xlabel(xlabel)
    data_dicts[-1]['xlabel'] = xlabel # store the xlabel in the recently appended dict

    title = f'Histogram with 10^{i} values'
    plt.title(title)
    data_dicts[-1]['title'] = title # store the title in the recently appended dict

    # store the bin and edge information in a sub dictionary
    data_dicts[-1]['histogram'] = {}
    data_dicts[-1]['histogram']['bins'] = bins
    data_dicts[-1]['histogram']['edges'] = edges
../_images/example_nb_histogram_to_hepfile_5_0.png
../_images/example_nb_histogram_to_hepfile_5_1.png
../_images/example_nb_histogram_to_hepfile_5_2.png
../_images/example_nb_histogram_to_hepfile_5_3.png
../_images/example_nb_histogram_to_hepfile_5_4.png

Below is the list of data dictionaries that we plan to store in the hepfile. Notice how it is heterogeneous and the lengths of the bins and edges arrays vary between histograms!

[4]:
print(data_dicts)
[{'ylabel': 'y-label 1', 'xlabel': 'x-label 1', 'title': 'Histogram with 10^1 values', 'histogram': {'bins': array([1, 1, 4, 4]), 'edges': array([-2.55001779, -1.82556961, -1.10112143, -0.37667325,  0.34777493])}}, {'ylabel': 'y-label 2', 'xlabel': 'x-label 2', 'title': 'Histogram with 10^2 values', 'histogram': {'bins': array([ 1,  0,  0,  5,  1,  4,  7,  9, 22, 14, 14,  6,  7,  5,  3,  2]), 'edges': array([-3.22556605, -2.8798167 , -2.53406736, -2.18831801, -1.84256866,
       -1.49681931, -1.15106997, -0.80532062, -0.45957127, -0.11382193,
        0.23192742,  0.57767677,  0.92342611,  1.26917546,  1.61492481,
        1.96067416,  2.3064235 ])}}, {'ylabel': 'y-label 3', 'xlabel': 'x-label 3', 'title': 'Histogram with 10^3 values', 'histogram': {'bins': array([ 1,  0,  1,  0,  1,  3,  1,  1,  5,  5,  6,  3,  8,  7, 15,  5, 13,
       20, 12, 17, 22, 23, 22, 29, 33, 32, 37, 36, 38, 28, 39, 39, 44, 52,
       46, 37, 17, 38, 28, 34, 23, 23, 23, 29, 19, 17, 14,  9,  7,  7,  9,
        3,  2,  2,  5,  2,  1,  1,  1,  1,  1,  1,  1,  1]), 'edges': array([-3.07930803, -2.98334764, -2.88738724, -2.79142685, -2.69546645,
       -2.59950606, -2.50354567, -2.40758527, -2.31162488, -2.21566449,
       -2.11970409, -2.0237437 , -1.9277833 , -1.83182291, -1.73586252,
       -1.63990212, -1.54394173, -1.44798134, -1.35202094, -1.25606055,
       -1.16010015, -1.06413976, -0.96817937, -0.87221897, -0.77625858,
       -0.68029819, -0.58433779, -0.4883774 , -0.392417  , -0.29645661,
       -0.20049622, -0.10453582, -0.00857543,  0.08738496,  0.18334536,
        0.27930575,  0.37526614,  0.47122654,  0.56718693,  0.66314733,
        0.75910772,  0.85506811,  0.95102851,  1.0469889 ,  1.14294929,
        1.23890969,  1.33487008,  1.43083048,  1.52679087,  1.62275126,
        1.71871166,  1.81467205,  1.91063244,  2.00659284,  2.10255323,
        2.19851363,  2.29447402,  2.39043441,  2.48639481,  2.5823552 ,
        2.67831559,  2.77427599,  2.87023638,  2.96619678,  3.06215717])}}, {'ylabel': 'y-label 4', 'xlabel': 'x-label 4', 'title': 'Histogram with 10^4 values', 'histogram': {'bins': array([  1,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   3,
         1,   3,   0,   1,   0,   0,   2,   1,   0,   2,   3,   2,   1,
         5,   4,   2,   3,   2,   3,   1,   2,   2,   5,   5,   1,   8,
         5,   5,   6,  10,   5,   7,   8,  10,   8,   7,  12,   7,   6,
        18,  11,  18,  14,  18,  19,  20,  24,  16,  18,  24,  29,  24,
        22,  20,  27,  25,  35,  31,  39,  36,  34,  32,  40,  41,  36,
        36,  42,  41,  42,  43,  43,  64,  60,  48,  55,  62,  65,  55,
        70,  61,  66,  68,  66,  59,  77,  88,  93,  75,  79,  70,  77,
        79,  94,  84,  84,  83, 112, 106, 106, 107, 101,  83, 100, 120,
       103,  99, 104, 120, 126, 114, 125,  99, 115, 109, 112,  95, 118,
       122, 122, 126, 118, 109, 116, 107, 108, 119, 102, 105,  94,  98,
       102, 117, 102,  94,  88, 110,  99, 103, 102,  79,  86,  82,  98,
        97,  78,  66,  74,  72,  82,  62,  72,  66,  58,  69,  58,  62,
        66,  53,  55,  53,  45,  35,  50,  44,  41,  36,  40,  40,  53,
        34,  31,  36,  17,  20,  21,  23,  27,  22,  26,  21,  17,  28,
        23,  13,  17,  16,  16,  13,  19,  18,  14,  16,  11,   8,   5,
        14,  10,   6,   5,   7,   9,   5,   5,   6,   3,   3,   4,   5,
         4,   3,   3,   1,   2,   2,   0,   3,   2,   0,   1,   2,   1,
         1,   0,   1,   0,   1,   0,   0,   3,   1,   0,   3,   0,   0,
         0,   0,   0,   2,   1,   0,   0,   0,   1]), 'edges': array([-3.57380154, -3.54581674, -3.51783193, -3.48984713, -3.46186233,
       -3.43387752, -3.40589272, -3.37790792, -3.34992311, -3.32193831,
       -3.29395351, -3.2659687 , -3.2379839 , -3.2099991 , -3.18201429,
       -3.15402949, -3.12604468, -3.09805988, -3.07007508, -3.04209027,
       -3.01410547, -2.98612067, -2.95813586, -2.93015106, -2.90216626,
       -2.87418145, -2.84619665, -2.81821185, -2.79022704, -2.76224224,
       -2.73425744, -2.70627263, -2.67828783, -2.65030303, -2.62231822,
       -2.59433342, -2.56634862, -2.53836381, -2.51037901, -2.4823942 ,
       -2.4544094 , -2.4264246 , -2.39843979, -2.37045499, -2.34247019,
       -2.31448538, -2.28650058, -2.25851578, -2.23053097, -2.20254617,
       -2.17456137, -2.14657656, -2.11859176, -2.09060696, -2.06262215,
       -2.03463735, -2.00665255, -1.97866774, -1.95068294, -1.92269813,
       -1.89471333, -1.86672853, -1.83874372, -1.81075892, -1.78277412,
       -1.75478931, -1.72680451, -1.69881971, -1.6708349 , -1.6428501 ,
       -1.6148653 , -1.58688049, -1.55889569, -1.53091089, -1.50292608,
       -1.47494128, -1.44695648, -1.41897167, -1.39098687, -1.36300207,
       -1.33501726, -1.30703246, -1.27904765, -1.25106285, -1.22307805,
       -1.19509324, -1.16710844, -1.13912364, -1.11113883, -1.08315403,
       -1.05516923, -1.02718442, -0.99919962, -0.97121482, -0.94323001,
       -0.91524521, -0.88726041, -0.8592756 , -0.8312908 , -0.803306  ,
       -0.77532119, -0.74733639, -0.71935159, -0.69136678, -0.66338198,
       -0.63539717, -0.60741237, -0.57942757, -0.55144276, -0.52345796,
       -0.49547316, -0.46748835, -0.43950355, -0.41151875, -0.38353394,
       -0.35554914, -0.32756434, -0.29957953, -0.27159473, -0.24360993,
       -0.21562512, -0.18764032, -0.15965552, -0.13167071, -0.10368591,
       -0.0757011 , -0.0477163 , -0.0197315 ,  0.00825331,  0.03623811,
        0.06422291,  0.09220772,  0.12019252,  0.14817732,  0.17616213,
        0.20414693,  0.23213173,  0.26011654,  0.28810134,  0.31608614,
        0.34407095,  0.37205575,  0.40004055,  0.42802536,  0.45601016,
        0.48399496,  0.51197977,  0.53996457,  0.56794938,  0.59593418,
        0.62391898,  0.65190379,  0.67988859,  0.70787339,  0.7358582 ,
        0.763843  ,  0.7918278 ,  0.81981261,  0.84779741,  0.87578221,
        0.90376702,  0.93175182,  0.95973662,  0.98772143,  1.01570623,
        1.04369103,  1.07167584,  1.09966064,  1.12764544,  1.15563025,
        1.18361505,  1.21159986,  1.23958466,  1.26756946,  1.29555427,
        1.32353907,  1.35152387,  1.37950868,  1.40749348,  1.43547828,
        1.46346309,  1.49144789,  1.51943269,  1.5474175 ,  1.5754023 ,
        1.6033871 ,  1.63137191,  1.65935671,  1.68734151,  1.71532632,
        1.74331112,  1.77129593,  1.79928073,  1.82726553,  1.85525034,
        1.88323514,  1.91121994,  1.93920475,  1.96718955,  1.99517435,
        2.02315916,  2.05114396,  2.07912876,  2.10711357,  2.13509837,
        2.16308317,  2.19106798,  2.21905278,  2.24703758,  2.27502239,
        2.30300719,  2.33099199,  2.3589768 ,  2.3869616 ,  2.41494641,
        2.44293121,  2.47091601,  2.49890082,  2.52688562,  2.55487042,
        2.58285523,  2.61084003,  2.63882483,  2.66680964,  2.69479444,
        2.72277924,  2.75076405,  2.77874885,  2.80673365,  2.83471846,
        2.86270326,  2.89068806,  2.91867287,  2.94665767,  2.97464247,
        3.00262728,  3.03061208,  3.05859689,  3.08658169,  3.11456649,
        3.1425513 ,  3.1705361 ,  3.1985209 ,  3.22650571,  3.25449051,
        3.28247531,  3.31046012,  3.33844492,  3.36642972,  3.39441453,
        3.42239933,  3.45038413,  3.47836894,  3.50635374,  3.53433854,
        3.56232335,  3.59030815])}}, {'ylabel': 'y-label 5', 'xlabel': 'x-label 5', 'title': 'Histogram with 10^5 values', 'histogram': {'bins': array([1, 0, 0, ..., 0, 0, 1]), 'edges': array([-4.71583223, -4.70706297, -4.69829372, ...,  4.24634811,
        4.25511737,  4.26388663])}}]

Writing the hepfile#

Now that we have a list of dictionaries, we can easily write this to a hepfile using hepfile.dict_tools.dictlike_to_hepfile:

[5]:
# data_dicts is in a format we can easily save to a hepfile!
filename = 'histogram-hepfile.h5'
data = hf.dict_tools.dictlike_to_hepfile(data_dicts, filename)

Then we can view the data from the awkward array that is returned! Also note that data has been saved as a hepfile called histogram-hepfile.h5 that can be read in using hepfile.load if you wish.