Source Code Documentation#

hepfile.read#

Functions to assist in reading and accessing information in hepfiles.

hepfile.read.calculate_index_from_counters(counters: int) int#

Calculates an index array from the counters

hepfile.read.get_file_header(filename: str, return_type: str = 'dict') dict#

Get the file header and return it as a dictionary or dataframe

Args: filename(string): HDF5 file to open and read the header information

return_type(string): If ‘dict’ return the header information as a dictionary.

If ‘df’ or ‘dataframe’, return the information as a pandas dataframe.

hepfile.read.get_file_metadata(filename: str) dict#

Get the file metadata and return it as a dictionary

hepfile.read.get_nbuckets_in_data(data: dict) int#

Get the number of buckets in the data dictionary.

This is useful in case you’ve only pulled out subsets of the data

hepfile.read.get_nbuckets_in_file(filename: str) int#

Get the number of buckets in the file.

hepfile.read.load(filename: str, verbose: bool = False, desired_groups: list[str] = None, subset: int = None, return_type: str = 'dictionary') tuple[dict, dict]#

Reads all, or a subset of the data, from the HDF5 file to fill a data dictionary. Returns an empty dictionary to be filled later with data from individual buckets.

Parameters:
  • filename (string) – Name of the input file

  • verbose (boolean) – True if debug output is required

  • desired_groups (list) – Groups to be read from input file,

  • subset (int) – Number of buckets to be read from input file

  • return_type (str) – Type to return. Options are ‘dictionary’, ‘awkward’, and ‘ ‘pandas’. Default is ‘dictionary’. Note: the ‘awkward’ option requires hepfile to be installed with the awkward or all option and the ‘pandas’ option requires hepfile to be installed with the pandas or all option!

Returns:

Selected data from HDF5 file

bucket (dict): An empty bucket dictionary to be filled by data from

select buckets

Return type:

data (dict)

hepfile.read.print_file_header(filename: str) str#

Pretty print the file header

Parameters:

filename (str) – filename to retrieve the header from.

Returns:

String representation of the header information, if it exists.

hepfile.read.print_file_metadata(filename: str)#

Pretty print the file metadata

hepfile.read.unpack(bucket: dict, data: dict, n: int = 0)#

Fills the bucket dictionary with selected rows from the data dictionary.

Parameters:
  • bucket (dict) – bucket dictionary to be filled

  • data (dict) – Data dictionary used to fill the bucket dictionary

  • n (integer) – 0 by default. Which entry should be pulled out of the data dictionary and inserted into the bucket dictionary.

hepfile.write#

Functions to assist in writing a hepfile “from scratch”

hepfile.write.add_meta(data: dict, name: str, meta_data: list)#

Create metadata for a group (or singleton) and add it to data

Parameters:
  • data (dict) – a data object returned by hf.initialize()

  • name (str) – name of either a group, singleton, or dataset the metadata corresponds to. if passing a dataset name, make sure it is the full path (group/dataset)!

  • meta_data (list) – list of metadata to write to that group/dataset/singleton

hepfile.write.clear_bucket(bucket: dict) None#

Clears the data from the bucket dictionary - should the name of the function change?

Parameters:

bucket (dict) – The dictionary to be cleared. This is designed to clear the data from the lists in the bucket dictionary, but theoretically, it would clear out the lists from any dictionary.

hepfile.write.create_dataset(data: dict, dset_name: list, group: str = None, dtype: type = <class 'float'>, verbose=False, ignore_protected=False)#

Adds a dataset to a group in a dictionary. If the group does not exist, it will be created.

Parameters:
  • data (dict) – Dictionary that contains the group

  • dset_name (list/str) – Dataset to be added to the group (This doesn’t have to be a list)

  • group (string) – Name of group the dataset will be added to. None by default

  • dtype (type) – The data type. None by default - I don’t think this is every used

Returns:

If the group is None

Return type:

-1

hepfile.write.create_group(data: dict, group_name: str, counter: str = None, verbose=False, ignore_protected=False)#

Adds a group in the dictionary

Parameters:
  • data (dict) – Dictionary to which the group will be added

  • group_name (string) – Name of the group to be added

  • counter (string) – Name of the counter key. None by default

hepfile.write.create_single_bucket(data: dict) dict#

Creates an bucket dictionary that will be used to collect data and then packed into the the master data dictionary.

Parameters:

data (dict) – Data dictionary that will hold all the data from the bucket.

Returns:

The new bucket dictionary with keys and no bucket information

Return type:

bucket (dict)

hepfile.write.initialize() dict#

Creates an empty data dictionary

Returns:

An empty data dictionary

Return type:

data (dict)

hepfile.write.pack(data: dict, bucket: dict, AUTO_SET_COUNTER: bool = True, EMPTY_OUT_BUCKET: bool = True, STRICT_CHECKING: bool = False, verbose: bool = False)#

Takes the data from an bucket and packs it into the data dictionary, intelligently, so that it can be stored and extracted efficiently. (This is analagous to the ROOT TTree::Fill() member function).

Parameters:
  • data (dict) – Data dictionary to hold the entire dataset EDIT.

  • bucket (dict) – bucket to be packed into data.

  • EMPTY_OUT_BUCKET (bool) – If this is True then empty out the bucket container in preparation for the next iteration. We used to ask the users to do this “by hand” but now do it automatically by default. We allow the user to not do this, if they are running some sort of debugging.

hepfile.write.write_file_header(filename: str, mydict: dict, verbose: bool = False) File#

Writes header data to a protected group in an HDF5 file.

If there is already header information, it is overwritten by this function.

Parameters:
  • filename (string) – Name of file to write to (file should already exist and the group will be appended to it.)

  • mydict (dictionary) – Header data passed in by user

  • verbose (bool) – True to print out info as it runs

Returns:

Returns the file with new metadata

Return type:

hdoutfile (HDF5)

hepfile.write.write_file_metadata(filename: str, mydict: dict = None, write_default_values: bool = True, append: bool = True, verbose: bool = False) File#

Writes file metadata in the attributes of an HDF5 file

Args: filename (string): Name of output file

mydict (dictionary): Metadata desired by user

write_default_values (boolean): True if user wants to write/update the

default metadata: date, hepfile version, h5py version, numpy version, and Python version, false if otherwise.

append (boolean): True if user wants to keep older metadata, false otherwise. verbose (boolean): True to print out statements as it goes

Returns: hdoutfile (HDF5): File with new metadata

hepfile.write.write_to_file(filename: str, data: dict, comp_type: str = None, comp_opts: list = None, force_single_precision: bool = True, verbose: bool = False) File#

Writes the selected data to an HDF5 file

Parameters:
  • filename (string) – Name of output file

  • data (dictionary) – Data to be written into output file

  • comp_type (string) – Type of compression

  • force_single_precision (boolean) – True if data should be written in single precision

Returns:

File to which the data has been written

Return type:

hdoutfile (HDF5)

hepfile.dict_tools#

Functions to help convert dictionaries into hepfiles

hepfile.dict_tools.append(ak_dict: ak.Record, new_dict: dict) ak.Record#

Append a new event to an existing awkward dictionary with events

Note: This tool requires awkward to be installed. Make sure you installed with either 1) ‘python -m pip install hepfile[awkward]’ or, 2) ‘python -m pip install hepfile[all]’

Parameters:
  • ak_dict (ak.Record) – awkward Record of data

  • new_dict (dict) – Dictionary of value to append to ak_dict. All keys must match ak_dict!

Returns:

Awkward Record of awkward arrays with the new_dict appended

hepfile.dict_tools.dictlike_to_hepfile(dict_list: list[dict], outfile: str = None, how_to_pack='classic', **kwargs) dict#

This wraps on hepfile.awkward_tools.awkward_to_hepfile and writes a list of dictionaries to a hepfile.

Writes a list of dictlike object to a hepfile. Must have a specific format: - each dictlike object is a “event” - first level of dict keys are the groups - second level of dict keys are the datasets - entries in second level of dict object is the data (awkward array or list) - data entries in the first level of the dict are singleton objects

Parameters:
  • dict_list (list) – list of dictionaries or dataframes where each dictionary/df holds information on an event

  • outfile (str) – path to write output hepfile to

  • how_to_pack (str) – how to pack the input dataset. Options are ‘awkward’ or ‘classic’. ‘awkward’ called awkward_to_hepfile, ‘classic’ does it more traditional. default is ‘classic’. To use how_to_pack=’awkward’, make sure you installed hepfile with the ‘awkward’ or ‘all’ optional dependency!

  • **kwargs – passed to hepfile.write.write_to_file if ‘awkward’. Can only be ‘write_to_hepfile’ and ‘ignore_protected’ if ‘classic’.

Returns:

Dictionary of Awkward Arrays with the data stored in outfile

hepfile.awkward_tools#

These are tools to make working with and translating between awkward arrays and hepfile data objects easier.

Note: The base installation package does not contain these tools! You must have installed hepfile with either 1) ‘python -m pip install hepfile[awkward]’, or 2) ‘python -m pip install hepfile[all]’

hepfile.awkward_tools.awkward_to_hepfile(ak_array: Array, outfile: str = None, write_hepfile: bool = True, **kwargs) dict#

Write an awkward array with depth <= 2 to a hepfile

Parameters:
  • [ak.Array] (ak_array) – awkward array with fields of groups/singletons. Under the group fields are the dataset fields.

  • [str] (outfile) – path to where the hepfile should be written. Default is None and can only be None if write_hepfile=False.

  • [bool] (write_hepfile) – if True, write the hepfile and return the data dictionary. If False, just return the data dictionary without returning. Default is True.

  • **kwargs – passed to hepfile.write_to_file

Returns:

Data dictionary in the hepfile

hepfile.awkward_tools.hepfile_to_awkward(data: dict, groups: list = None, datasets: list = None) Record#

Converts all (or a subset of) the output data from hepfile.read.load to a dictionary of awkward arrays.

Parameters:
  • data (dict) – Output data dictionary from the hepfile.read.load function.

  • groups (list) – list of groups to pull from data and convert to awkward arrays.

  • datasets (list) – list of full dataset paths (ex. ‘jet/px’ not ‘px’) to pull

  • arrays. (from data and include in the awkward) –

Returns:

dictionary of awkward arrays with the data.

Return type:

ak_arrays (dict)

hepfile.awkward_tools.pack_multiple_awkward_arrays(d: dict, arr: Array, group_name: str = None, group_counter_name: str = None) None#

Pack an awkward array of arrays into group_name or the singletons group

Parameters:
  • [dict] (d) – hepfile data dictionary that is returned from hepfile.initialize()

  • [ak.Array] (arr) – Awkward array of the group in a set of data

  • [str] (group_name) – Name of the group to pack arr into, if None (default) it is packed into the signletons group

hepfile.awkward_tools.pack_single_awkward_array(d: dict, arr: Array, dset_name: str, group_name: str = None, counter: str = None) None#

Packs a 1D awkward array as a dataset/singleton depending on if group_name is given

Parameters:
  • [dict] (d) – data dictionary created by hepfile.initialize()

  • [ak.Array] (arr) – 1D awkward array to pack as either a dataset or a group. If group_name is None the arr is packed as a singleton

  • [str] (counter) – Full path to the dataset.

  • [str] – name of the group to pack the arr under, default is None

  • [str] – name of the counter in the hepfile for this dataset

hepfile.df_tools#

Tools to work with Pandas DataFrames and Hepfile data

Note: The base installation package does not contain these tools! You must have installed hepfile with either 1) ‘python -m pip install hepfile[pandas]’, or 2) ‘python -m pip install hepfile[all]’

hepfile.df_tools.awkward_to_df(ak_array: ak.Array, groups: list[str] = None, events: list[int] = None) dict[pd.DataFrame]#

Converts an awkward array of hepfile data to a dataframe. Does the same thing as hepfile_to_df but given an awkward array.

Note: You must have installed with ‘python -m pip install hepfile[all]’

to use this tool!

Parameters:
  • [ak.Array] (ak_array) – awkward array in the format of a hepfile

  • [list] (events) – groups to include, None (default) means include all groups

  • [list] – list of event indexes to include

Returns:

Dictionary of requested groups as dataframes where the keys are the group names. If only one group is requested then it just returns a dataframe of that group.

hepfile.df_tools.df_to_hepfile(df_dict: dict[pandas.core.frame.DataFrame], outfile: str = None, event_num_col='event_num', write_hepfile: bool = True) dict#

Converts a list of dataframes of group data to a hepfile. The opposite of hepfile_to_df. Must have an event_num column!

Parameters:
  • [dict] (df_dict) – dictionary of pandas DataFrame groups to write to a hepfile

  • [str] (event_num_col) – output file name, required if write_hepfile is True

  • [str] – name of a column in the pd.DataFrame to group by

  • [bool] (write_hepfile) – should we write the hepfile data to a hepfile?

Returns:

hepfile data dictionary

hepfile.df_tools.groupDF_to_eventDF(df_dict: dict[pandas.core.frame.DataFrame], event_num_col: str = 'event_num') dict#

Converts a dictionary of group dataframes to a dictionary of event dataframes

Parameters:
  • [dict] (df_dict) – dictionary of groups to convert to a dictionary of events

  • [str] (event_num_col) – column to group each group by

hepfile.df_tools.hepfile_to_df(data: dict, groups: list[str] = None, events: list[int] = None) dict[pandas.core.frame.DataFrame]#

Converts hepfile data to dataframes where each group is in its own dataframe and we add an extra column called ‘event_num’. Singletons have its own df

Parameters:
  • [dict] (data) – data object either loaded from a hepfile or about to be written to a hepfile.

  • [list] (events) – groups to include, None (default) means include all groups

  • [list] – list of event indexes to include

Returns:

Dictionary of requested groups as dataframes where the keys are the group names. If only one group is requested then it just returns a dataframe of that group.

hepfile.csv_tools#

Tools to help with managing csvs with hepfile

Note: The base installation package does not contain these tools! You must have installed hepfile with either 1) ‘python -m pip install hepfile[pandas]’, or 2) ‘python -m pip install hepfile[all]’

hepfile.csv_tools.csv_to_hepfile(csvpaths: list[str], common_key: str, outfile: str | None = None, group_names: list | None = None, write_hepfile: bool = True) tuple[str, dict]#

Convert a list of csvs to a hepfile

This is helpful for converting database-like csvs to a hepfile where each input csv has a common key and can be combined into a large table.

Parameters:
  • csvpaths (list[str]) – list of absolute paths to the csvs to convert to a hepfile

  • common_key (str) – The above list of csvs should have a common column name, give the name of this column

  • outfile (str) – The output file name, if None data is written to the first filepath in csvpaths with ‘csv’ replaced with ‘h5’

  • group_names (list) – the names for the groups in the hepfile. Default is None and the groups are based on the filenames

  • write_hepfile – (bool): if True, write the hepfile. Default is True.

Returns:

Dictionary of hepfile data

hepfile.errors#

Custom exception messages

exception hepfile.errors.AwkwardStructureError#

Thrown when the structure of an Awkward Array is not appropriate for the future processing.

exception hepfile.errors.DatasetSizeDiscrepancy#

Thrown when two datasets under one group do not have the same length. This is usually not appropriate for hepfiles.

exception hepfile.errors.DictStructureError#

Thrown when the structure of a dictionary is not appropriate for the future processing.

exception hepfile.errors.HeaderNotFound#

Thrown when there is no header found for a hepfile even though the user has requested it.

exception hepfile.errors.InputError#

General error to describe when the input value of a function in the module is either the wrong type or incorrectly formatted.

exception hepfile.errors.MetadataNotFound#

Thrown when there is no metadata found for a hepfile even though the user has requested it.

exception hepfile.errors.MissingOptionalDependency(module)#

Thrown when the user tries to use an optional part of the package that was not installed when they installed.

exception hepfile.errors.MissingSingletonValue#

Thrown when we try to pack a bucket into a hepfile data dictionary and no singleton value is found in the new bucket.

exception hepfile.errors.RangeSubsetError#

Thrown when the input range is incorrectly formatted. See the error for more details about what exactly is incorrect.