Source Code Documentation#
hepfile.read#
Functions to assist in reading and accessing information in hepfiles.
- hepfile.read.calculate_index_from_counters(counters: int) int#
Calculates an index array from the counters
- hepfile.read.get_file_header(filename: str, return_type: str = 'dict') dict#
Get the file header and return it as a dictionary or dataframe
Args: filename(string): HDF5 file to open and read the header information
- return_type(string): If ‘dict’ return the header information as a dictionary.
If ‘df’ or ‘dataframe’, return the information as a pandas dataframe.
- hepfile.read.get_file_metadata(filename: str) dict#
Get the file metadata and return it as a dictionary
- hepfile.read.get_nbuckets_in_data(data: dict) int#
Get the number of buckets in the data dictionary.
This is useful in case you’ve only pulled out subsets of the data
- hepfile.read.get_nbuckets_in_file(filename: str) int#
Get the number of buckets in the file.
- hepfile.read.load(filename: str, verbose: bool = False, desired_groups: list[str] = None, subset: int = None, return_type: str = 'dictionary') tuple[dict, dict]#
Reads all, or a subset of the data, from the HDF5 file to fill a data dictionary. Returns an empty dictionary to be filled later with data from individual buckets.
- Parameters:
filename (string) – Name of the input file
verbose (boolean) – True if debug output is required
desired_groups (list) – Groups to be read from input file,
subset (int) – Number of buckets to be read from input file
return_type (str) – Type to return. Options are ‘dictionary’, ‘awkward’, and ‘ ‘pandas’. Default is ‘dictionary’. Note: the ‘awkward’ option requires hepfile to be installed with the awkward or all option and the ‘pandas’ option requires hepfile to be installed with the pandas or all option!
- Returns:
Selected data from HDF5 file
- bucket (dict): An empty bucket dictionary to be filled by data from
select buckets
- Return type:
data (dict)
- hepfile.read.print_file_header(filename: str) str#
Pretty print the file header
- Parameters:
filename (str) – filename to retrieve the header from.
- Returns:
String representation of the header information, if it exists.
- hepfile.read.print_file_metadata(filename: str)#
Pretty print the file metadata
- hepfile.read.unpack(bucket: dict, data: dict, n: int = 0)#
Fills the bucket dictionary with selected rows from the data dictionary.
- Parameters:
bucket (dict) – bucket dictionary to be filled
data (dict) – Data dictionary used to fill the bucket dictionary
n (integer) – 0 by default. Which entry should be pulled out of the data dictionary and inserted into the bucket dictionary.
hepfile.write#
Functions to assist in writing a hepfile “from scratch”
- hepfile.write.add_meta(data: dict, name: str, meta_data: list)#
Create metadata for a group (or singleton) and add it to data
- Parameters:
data (dict) – a data object returned by hf.initialize()
name (str) – name of either a group, singleton, or dataset the metadata corresponds to. if passing a dataset name, make sure it is the full path (group/dataset)!
meta_data (list) – list of metadata to write to that group/dataset/singleton
- hepfile.write.clear_bucket(bucket: dict) None#
Clears the data from the bucket dictionary - should the name of the function change?
- Parameters:
bucket (dict) – The dictionary to be cleared. This is designed to clear the data from the lists in the bucket dictionary, but theoretically, it would clear out the lists from any dictionary.
- hepfile.write.create_dataset(data: dict, dset_name: list, group: str = None, dtype: type = <class 'float'>, verbose=False, ignore_protected=False)#
Adds a dataset to a group in a dictionary. If the group does not exist, it will be created.
- Parameters:
data (dict) – Dictionary that contains the group
dset_name (list/str) – Dataset to be added to the group (This doesn’t have to be a list)
group (string) – Name of group the dataset will be added to. None by default
dtype (type) – The data type. None by default - I don’t think this is every used
- Returns:
If the group is None
- Return type:
-1
- hepfile.write.create_group(data: dict, group_name: str, counter: str = None, verbose=False, ignore_protected=False)#
Adds a group in the dictionary
- Parameters:
data (dict) – Dictionary to which the group will be added
group_name (string) – Name of the group to be added
counter (string) – Name of the counter key. None by default
- hepfile.write.create_single_bucket(data: dict) dict#
Creates an bucket dictionary that will be used to collect data and then packed into the the master data dictionary.
- Parameters:
data (dict) – Data dictionary that will hold all the data from the bucket.
- Returns:
The new bucket dictionary with keys and no bucket information
- Return type:
bucket (dict)
- hepfile.write.initialize() dict#
Creates an empty data dictionary
- Returns:
An empty data dictionary
- Return type:
data (dict)
- hepfile.write.pack(data: dict, bucket: dict, AUTO_SET_COUNTER: bool = True, EMPTY_OUT_BUCKET: bool = True, STRICT_CHECKING: bool = False, verbose: bool = False)#
Takes the data from an bucket and packs it into the data dictionary, intelligently, so that it can be stored and extracted efficiently. (This is analagous to the ROOT TTree::Fill() member function).
- Parameters:
data (dict) – Data dictionary to hold the entire dataset EDIT.
bucket (dict) – bucket to be packed into data.
EMPTY_OUT_BUCKET (bool) – If this is True then empty out the bucket container in preparation for the next iteration. We used to ask the users to do this “by hand” but now do it automatically by default. We allow the user to not do this, if they are running some sort of debugging.
- hepfile.write.write_file_header(filename: str, mydict: dict, verbose: bool = False) File#
Writes header data to a protected group in an HDF5 file.
If there is already header information, it is overwritten by this function.
- Parameters:
filename (string) – Name of file to write to (file should already exist and the group will be appended to it.)
mydict (dictionary) – Header data passed in by user
verbose (bool) – True to print out info as it runs
- Returns:
Returns the file with new metadata
- Return type:
hdoutfile (HDF5)
- hepfile.write.write_file_metadata(filename: str, mydict: dict = None, write_default_values: bool = True, append: bool = True, verbose: bool = False) File#
Writes file metadata in the attributes of an HDF5 file
Args: filename (string): Name of output file
mydict (dictionary): Metadata desired by user
- write_default_values (boolean): True if user wants to write/update the
default metadata: date, hepfile version, h5py version, numpy version, and Python version, false if otherwise.
append (boolean): True if user wants to keep older metadata, false otherwise. verbose (boolean): True to print out statements as it goes
Returns: hdoutfile (HDF5): File with new metadata
- hepfile.write.write_to_file(filename: str, data: dict, comp_type: str = None, comp_opts: list = None, force_single_precision: bool = True, verbose: bool = False) File#
Writes the selected data to an HDF5 file
- Parameters:
filename (string) – Name of output file
data (dictionary) – Data to be written into output file
comp_type (string) – Type of compression
force_single_precision (boolean) – True if data should be written in single precision
- Returns:
File to which the data has been written
- Return type:
hdoutfile (HDF5)
hepfile.dict_tools#
Functions to help convert dictionaries into hepfiles
- hepfile.dict_tools.append(ak_dict: ak.Record, new_dict: dict) ak.Record#
Append a new event to an existing awkward dictionary with events
Note: This tool requires awkward to be installed. Make sure you installed with either 1) ‘python -m pip install hepfile[awkward]’ or, 2) ‘python -m pip install hepfile[all]’
- Parameters:
ak_dict (ak.Record) – awkward Record of data
new_dict (dict) – Dictionary of value to append to ak_dict. All keys must match ak_dict!
- Returns:
Awkward Record of awkward arrays with the new_dict appended
- hepfile.dict_tools.dictlike_to_hepfile(dict_list: list[dict], outfile: str = None, how_to_pack='classic', **kwargs) dict#
This wraps on hepfile.awkward_tools.awkward_to_hepfile and writes a list of dictionaries to a hepfile.
Writes a list of dictlike object to a hepfile. Must have a specific format: - each dictlike object is a “event” - first level of dict keys are the groups - second level of dict keys are the datasets - entries in second level of dict object is the data (awkward array or list) - data entries in the first level of the dict are singleton objects
- Parameters:
dict_list (list) – list of dictionaries or dataframes where each dictionary/df holds information on an event
outfile (str) – path to write output hepfile to
how_to_pack (str) – how to pack the input dataset. Options are ‘awkward’ or ‘classic’. ‘awkward’ called awkward_to_hepfile, ‘classic’ does it more traditional. default is ‘classic’. To use how_to_pack=’awkward’, make sure you installed hepfile with the ‘awkward’ or ‘all’ optional dependency!
**kwargs – passed to hepfile.write.write_to_file if ‘awkward’. Can only be ‘write_to_hepfile’ and ‘ignore_protected’ if ‘classic’.
- Returns:
Dictionary of Awkward Arrays with the data stored in outfile
hepfile.awkward_tools#
These are tools to make working with and translating between awkward arrays and hepfile data objects easier.
Note: The base installation package does not contain these tools! You must have installed hepfile with either 1) ‘python -m pip install hepfile[awkward]’, or 2) ‘python -m pip install hepfile[all]’
- hepfile.awkward_tools.awkward_to_hepfile(ak_array: Array, outfile: str = None, write_hepfile: bool = True, **kwargs) dict#
Write an awkward array with depth <= 2 to a hepfile
- Parameters:
[ak.Array] (ak_array) – awkward array with fields of groups/singletons. Under the group fields are the dataset fields.
[str] (outfile) – path to where the hepfile should be written. Default is None and can only be None if write_hepfile=False.
[bool] (write_hepfile) – if True, write the hepfile and return the data dictionary. If False, just return the data dictionary without returning. Default is True.
**kwargs – passed to hepfile.write_to_file
- Returns:
Data dictionary in the hepfile
- hepfile.awkward_tools.hepfile_to_awkward(data: dict, groups: list = None, datasets: list = None) Record#
Converts all (or a subset of) the output data from hepfile.read.load to a dictionary of awkward arrays.
- Parameters:
data (dict) – Output data dictionary from the hepfile.read.load function.
groups (list) – list of groups to pull from data and convert to awkward arrays.
datasets (list) – list of full dataset paths (ex. ‘jet/px’ not ‘px’) to pull
arrays. (from data and include in the awkward) –
- Returns:
dictionary of awkward arrays with the data.
- Return type:
ak_arrays (dict)
- hepfile.awkward_tools.pack_multiple_awkward_arrays(d: dict, arr: Array, group_name: str = None, group_counter_name: str = None) None#
Pack an awkward array of arrays into group_name or the singletons group
- Parameters:
[dict] (d) – hepfile data dictionary that is returned from hepfile.initialize()
[ak.Array] (arr) – Awkward array of the group in a set of data
[str] (group_name) – Name of the group to pack arr into, if None (default) it is packed into the signletons group
- hepfile.awkward_tools.pack_single_awkward_array(d: dict, arr: Array, dset_name: str, group_name: str = None, counter: str = None) None#
Packs a 1D awkward array as a dataset/singleton depending on if group_name is given
- Parameters:
[dict] (d) – data dictionary created by hepfile.initialize()
[ak.Array] (arr) – 1D awkward array to pack as either a dataset or a group. If group_name is None the arr is packed as a singleton
[str] (counter) – Full path to the dataset.
[str] – name of the group to pack the arr under, default is None
[str] – name of the counter in the hepfile for this dataset
hepfile.df_tools#
Tools to work with Pandas DataFrames and Hepfile data
Note: The base installation package does not contain these tools! You must have installed hepfile with either 1) ‘python -m pip install hepfile[pandas]’, or 2) ‘python -m pip install hepfile[all]’
- hepfile.df_tools.awkward_to_df(ak_array: ak.Array, groups: list[str] = None, events: list[int] = None) dict[pd.DataFrame]#
Converts an awkward array of hepfile data to a dataframe. Does the same thing as hepfile_to_df but given an awkward array.
- Note: You must have installed with ‘python -m pip install hepfile[all]’
to use this tool!
- Parameters:
[ak.Array] (ak_array) – awkward array in the format of a hepfile
[list] (events) – groups to include, None (default) means include all groups
[list] – list of event indexes to include
- Returns:
Dictionary of requested groups as dataframes where the keys are the group names. If only one group is requested then it just returns a dataframe of that group.
- hepfile.df_tools.df_to_hepfile(df_dict: dict[pandas.core.frame.DataFrame], outfile: str = None, event_num_col='event_num', write_hepfile: bool = True) dict#
Converts a list of dataframes of group data to a hepfile. The opposite of hepfile_to_df. Must have an event_num column!
- Parameters:
[dict] (df_dict) – dictionary of pandas DataFrame groups to write to a hepfile
[str] (event_num_col) – output file name, required if write_hepfile is True
[str] – name of a column in the pd.DataFrame to group by
[bool] (write_hepfile) – should we write the hepfile data to a hepfile?
- Returns:
hepfile data dictionary
- hepfile.df_tools.groupDF_to_eventDF(df_dict: dict[pandas.core.frame.DataFrame], event_num_col: str = 'event_num') dict#
Converts a dictionary of group dataframes to a dictionary of event dataframes
- Parameters:
[dict] (df_dict) – dictionary of groups to convert to a dictionary of events
[str] (event_num_col) – column to group each group by
- hepfile.df_tools.hepfile_to_df(data: dict, groups: list[str] = None, events: list[int] = None) dict[pandas.core.frame.DataFrame]#
Converts hepfile data to dataframes where each group is in its own dataframe and we add an extra column called ‘event_num’. Singletons have its own df
- Parameters:
[dict] (data) – data object either loaded from a hepfile or about to be written to a hepfile.
[list] (events) – groups to include, None (default) means include all groups
[list] – list of event indexes to include
- Returns:
Dictionary of requested groups as dataframes where the keys are the group names. If only one group is requested then it just returns a dataframe of that group.
hepfile.csv_tools#
Tools to help with managing csvs with hepfile
Note: The base installation package does not contain these tools! You must have installed hepfile with either 1) ‘python -m pip install hepfile[pandas]’, or 2) ‘python -m pip install hepfile[all]’
- hepfile.csv_tools.csv_to_hepfile(csvpaths: list[str], common_key: str, outfile: str | None = None, group_names: list | None = None, write_hepfile: bool = True) tuple[str, dict]#
Convert a list of csvs to a hepfile
This is helpful for converting database-like csvs to a hepfile where each input csv has a common key and can be combined into a large table.
- Parameters:
csvpaths (list[str]) – list of absolute paths to the csvs to convert to a hepfile
common_key (str) – The above list of csvs should have a common column name, give the name of this column
outfile (str) – The output file name, if None data is written to the first filepath in csvpaths with ‘csv’ replaced with ‘h5’
group_names (list) – the names for the groups in the hepfile. Default is None and the groups are based on the filenames
write_hepfile – (bool): if True, write the hepfile. Default is True.
- Returns:
Dictionary of hepfile data
hepfile.errors#
Custom exception messages
- exception hepfile.errors.AwkwardStructureError#
Thrown when the structure of an Awkward Array is not appropriate for the future processing.
- exception hepfile.errors.DatasetSizeDiscrepancy#
Thrown when two datasets under one group do not have the same length. This is usually not appropriate for hepfiles.
- exception hepfile.errors.DictStructureError#
Thrown when the structure of a dictionary is not appropriate for the future processing.
- exception hepfile.errors.HeaderNotFound#
Thrown when there is no header found for a hepfile even though the user has requested it.
- exception hepfile.errors.InputError#
General error to describe when the input value of a function in the module is either the wrong type or incorrectly formatted.
- exception hepfile.errors.MetadataNotFound#
Thrown when there is no metadata found for a hepfile even though the user has requested it.
- exception hepfile.errors.MissingOptionalDependency(module)#
Thrown when the user tries to use an optional part of the package that was not installed when they installed.
- exception hepfile.errors.MissingSingletonValue#
Thrown when we try to pack a bucket into a hepfile data dictionary and no singleton value is found in the new bucket.
- exception hepfile.errors.RangeSubsetError#
Thrown when the input range is incorrectly formatted. See the error for more details about what exactly is incorrect.