Working with Awkward Arrays#

Awkward arrays are existing python software that allows for different length arrays to be stored in a single array. This can be very useful for those working with “heterogeneous” data.

[1]:
# imports
import awkward as ak
import hepfile as hf

Introduction to Awkward Arrays#

This is a general overview, see https://awkward-array.org/doc/main/index.html for more details.

Say we have an array, list1, that is made up of lists of lists that are different lengths.

[2]:
list1 = [[1,2,3],
        [4,5],
        [6]]

Sadly, NumPy doesn’t allow for easy manipulations/calculations with such “ragged” arrays. That is where the awkward package becomes very useful. We can create an awkward array from list1 with the following code:

[3]:
awk = ak.Array(list1)
print(awk)
print(type(awk))
[[1, 2, 3], [4, 5], [6]]
<class 'awkward.highlevel.Array'>

Then, we can do many similar calculations that we normally could do with NumPy

[4]:
# sum along different axis
print(f'Total Sum = {ak.sum(awk)}')
print(f'Sum of columns = {ak.sum(awk, axis=0)}')
print(f'Sum of rows = {ak.sum(awk, axis=1)}')
Total Sum = 21
Sum of columns = [11, 7, 3]
Sum of rows = [6, 9, 6]

Converting hepfiles to awkward arrays#

All of the awkward tools for hepfile are in hepfile.awkward_tools.

We have built in an easy method to go from the output of the hepfile.read.load method to an awkward array called hepfile_to_awkward

Note: This section of this tutorial assumes you have completed the writing_hepfiles_from_dicts tutorial!

[5]:
infile = 'output_from_dict.hdf5'

# read in the hepfile data
data, _ = hf.load(infile)
print(data)
# convert it to an awkward array
dataAwk = hf.awkward_tools.hepfile_to_awkward(data)
print()
print('Awkward Array:\n')
dataAwk.show()
{'_MAP_DATASETS_TO_COUNTERS_': {'_SINGLETONS_GROUP_': '_SINGLETONS_GROUP_/COUNTER', 'jet': 'jet/njet', 'jet/px': 'jet/njet', 'jet/py': 'jet/njet', 'muons': 'muons/nmuons', 'muons/px': 'muons/nmuons', 'muons/py': 'muons/nmuons', 'nParticles': '_SINGLETONS_GROUP_/COUNTER'}, '_MAP_DATASETS_TO_INDEX_': {'_SINGLETONS_GROUP_': '_SINGLETONS_GROUP_/COUNTER_INDEX', 'jet': 'jet/njet_INDEX', 'jet/px': 'jet/njet_INDEX', 'jet/py': 'jet/njet_INDEX', 'muons': 'muons/nmuons_INDEX', 'muons/px': 'muons/nmuons_INDEX', 'muons/py': 'muons/nmuons_INDEX', 'nParticles': '_SINGLETONS_GROUP_/COUNTER_INDEX'}, '_LIST_OF_COUNTERS_': ['_SINGLETONS_GROUP_/COUNTER', 'jet/njet', 'muons/nmuons'], '_LIST_OF_DATASETS_': ['_SINGLETONS_GROUP_', '_SINGLETONS_GROUP_/COUNTER', 'jet', 'jet/njet', 'jet/px', 'jet/py', 'muons', 'muons/nmuons', 'muons/px', 'muons/py', 'nParticles'], '_META_': {}, '_NUMBER_OF_BUCKETS_': 2, '_SINGLETONS_GROUP_': array(['nParticles'], dtype='<U10'), '_SINGLETONS_GROUP_/COUNTER': array([1, 1]), '_SINGLETONS_GROUP_/COUNTER_INDEX': array([0, 1]), 'jet/njet': array([3, 4]), 'jet/njet_INDEX': array([0, 3]), 'muons/nmuons': array([3, 4]), 'muons/nmuons_INDEX': array([0, 3]), 'jet/px': array([1, 2, 3, 3, 4, 6, 7]), 'jet/py': array([1, 2, 3, 3, 4, 6, 7]), 'muons/px': array([1, 2, 3, 3, 4, 6, 7]), 'muons/py': array([1, 2, 3, 3, 4, 6, 7]), 'nParticles': array([3, 4]), '_GROUPS_': {'_SINGLETONS_GROUP_': ['nParticles'], 'jet': ['njet', 'px', 'py'], 'muons': ['nmuons', 'px', 'py']}, '_MAP_DATASETS_TO_DATA_TYPES_': {'_SINGLETONS_GROUP_': dtype('<U10'), '_SINGLETONS_GROUP_/COUNTER': dtype('int64'), 'jet/njet': dtype('int64'), 'jet/px': dtype('int64'), 'jet/py': dtype('int64'), 'muons/nmuons': dtype('int64'), 'muons/px': dtype('int64'), 'muons/py': dtype('int64'), 'nParticles': dtype('int64')}, '_PROTECTED_NAMES_': {'_GROUPS_', '_META_', '_PROTECTED_NAMES_', '_LIST_OF_COUNTERS_', '_MAP_DATASETS_TO_COUNTERS_', '_SINGLETONS_GROUP_/COUNTER', '_SINGLETONSGROUPFORSTORAGE_', '_MAP_DATASETS_TO_DATA_TYPES_', '_SINGLETONS_GROUP_', '_HEADER_'}}

Awkward Array:

[{nParticles: 3, jet: {px: [1, ...], ...}, muons: {...}},
 {nParticles: 4, jet: {px: [3, ...], ...}, muons: {...}}]

Such a structure may be more intuitive to some and may make some analysis easier.

Appending to this awkward array#

Now say that we want to add some new event to this awkward array. This can either be to continue to work with the awkward array or to add data to the hepfile.

Let’s say we want to add the following dictionary:

[6]:
new_dict = {'jet': {'px': [10, 100], 'py': [0, 0]},
            'muons': {'px': [5, 1000], 'py': [0, -1]},
            'nParticles': 2
            }

We can add this to the existing awkward array using the hepfile.dict_tools.append function. To call this function, we pass in first the existing awkward array and second the new dictionary.

[7]:
newAwkData = hf.dict_tools.append(dataAwk, new_dict)

newAwkData.show()
[{nParticles: 3, jet: {px: [1, ...], ...}, muons: {...}},
 {nParticles: 4, jet: {px: [3, ...], ...}, muons: {...}},
 {nParticles: 2, jet: {px: [...], py: ..., ...}, muons: {...}}]

Rewriting the awkward data as a hepfile#

Now that we have modified our awkward data array we can rewrite it as a hepfile.

To do this, all we need to do is define an output file name and call hepfile.awkward_tools.awkward_to_hepfile

[8]:
outfile = 'updated-awkward-array.h5'
hf.awkward_tools.awkward_to_hepfile(newAwkData, outfile)
[8]:
{'_GROUPS_': {'_SINGLETONS_GROUP_': ['COUNTER', 'nParticles'],
  'jet': ['njet', 'px', 'py'],
  'muons': ['nmuons', 'px', 'py']},
 '_MAP_DATASETS_TO_COUNTERS_': {'_SINGLETONS_GROUP_': '_SINGLETONS_GROUP_/COUNTER',
  'nParticles': '_SINGLETONS_GROUP_/COUNTER',
  'jet': 'jet/njet',
  'jet/px': 'jet/njet',
  'jet/py': 'jet/njet',
  'muons': 'muons/nmuons',
  'muons/px': 'muons/nmuons',
  'muons/py': 'muons/nmuons'},
 '_LIST_OF_COUNTERS_': ['_SINGLETONS_GROUP_/COUNTER',
  'jet/njet',
  'muons/nmuons'],
 '_SINGLETONS_GROUP_/COUNTER': array([1, 1, 1]),
 '_MAP_DATASETS_TO_DATA_TYPES_': {'_SINGLETONS_GROUP_/COUNTER': int,
  'nParticles': dtype('int64'),
  'jet/njet': int,
  'jet/px': dtype('int64'),
  'jet/py': dtype('int64'),
  'muons/nmuons': int,
  'muons/px': dtype('int64'),
  'muons/py': dtype('int64')},
 '_META_': {},
 'nParticles': array([3, 4, 2]),
 'jet/px': array([  1,   2,   3,   3,   4,   6,   7,  10, 100]),
 'jet/njet': array([3, 4, 2], dtype=int32),
 'jet/py': array([1, 2, 3, 3, 4, 6, 7, 0, 0]),
 'muons/px': array([   1,    2,    3,    3,    4,    6,    7,    5, 1000]),
 'muons/nmuons': array([3, 4, 2], dtype=int32),
 'muons/py': array([ 1,  2,  3,  3,  4,  6,  7,  0, -1])}
[ ]: