`hipscat.catalog.partition_info`#

Container class to hold per-partition metadata

Module Contents#

Classes#

PartitionInfo

Container class for per-partition info.

class PartitionInfo(pixel_list: List[hipscat.pixel_math.HealpixPixel], catalog_base_dir: str = None)[source]#

Container class for per-partition info.

METADATA_ORDER_COLUMN_NAME = 'Norder'[source]#

METADATA_DIR_COLUMN_NAME = 'Dir'[source]#

METADATA_PIXEL_COLUMN_NAME = 'Npix'[source]#

get_healpix_pixels() → List[hipscat.pixel_math.HealpixPixel][source]#

Get healpix pixel objects for all pixels represented as partitions.

Returns:: List of HealpixPixel

get_highest_order() → int[source]#

Get the highest healpix order for the dataset.

Returns:: int representing highest order.

write_to_file(partition_info_file: hipscat.io.FilePointer = None, catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#

Write all partition data to CSV file.

If no paths are provided, the catalog base directory from the read_from_dir call is used.

Parameters:

partition_info_file – FilePointer to where the partition_info.csv file will be written.
catalog_path – base directory for a catalog where the partition_info.csv file will be written.
storage_options (dict) – dictionary that contains abstract filesystem credentials

Raises:

ValueError – if no path is provided, and could not be inferred.

write_to_metadata_files(catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#

Generate parquet metadata, using the known partitions.

If no catalog_path is provided, the catalog base directory from the read_from_dir call is used.

Parameters:

catalog_path (FilePointer) – base path for the catalog
storage_options (dict) – dictionary that contains abstract filesystem credentials

Raises:

ValueError – if no path is provided, and could not be inferred.

classmethod read_from_dir(catalog_base_dir: hipscat.io.FilePointer, storage_options: dict = None) → PartitionInfo[source]#

Read partition info from a file within a hipscat directory.

This will look for a partition_info.csv file, and if not found, will look for a _metadata file. The second approach is typically slower for large catalogs therefore a warning is issued to the user. In internal testing with large catalogs, the first approach takes less than a second, while the second can take 10-20 seconds.

Parameters:

catalog_base_dir – path to the root directory of the catalog
storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionInfo object with the data from the file

Raises:

FileNotFoundError – if neither desired file is found in the catalog_base_dir

classmethod read_from_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) → PartitionInfo[source]#

Read partition info from a _metadata file to create an object

Parameters:

metadata_file (FilePointer) – FilePointer to the _metadata file
storage_options (dict) – dictionary that contains abstract filesystem credentials
strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.

Returns:

A PartitionInfo object with the data from the file

classmethod _read_from_metadata_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) → List[hipscat.pixel_math.HealpixPixel][source]#

Read partition info list from a _metadata file.

Parameters:

metadata_file (FilePointer) – FilePointer to the _metadata file
storage_options (dict) – dictionary that contains abstract filesystem credentials
strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.

Returns:

A PartitionInfo object with the data from the file

classmethod read_from_csv(partition_info_file: hipscat.io.FilePointer, storage_options: dict = None) → PartitionInfo[source]#

Read partition info from a partition_info.csv file to create an object

Parameters:

partition_info_file (FilePointer) – FilePointer to the partition_info.csv file
storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionInfo object with the data from the file

classmethod _read_from_csv(partition_info_file: hipscat.io.FilePointer, storage_options: dict = None) → PartitionInfo[source]#

Read partition info from a partition_info.csv file to create an object

Parameters:

partition_info_file (FilePointer) – FilePointer to the partition_info.csv file
storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionInfo object with the data from the file

as_dataframe()[source]#

Construct a pandas dataframe for the partition info pixels.

Returns:: Dataframe with order, directory, and pixel info.

classmethod from_healpix(healpix_pixels: List[hipscat.pixel_math.HealpixPixel]) → PartitionInfo[source]#

Create a partition info object from a list of constituent healpix pixels.

Parameters:: healpix_pixels – list of healpix pixels
Returns:: A PartitionInfo object with the same healpix pixels

hipscat.catalog.partition_info

Contents

hipscat.catalog.partition_info#

Module Contents#

Classes#

`hipscat.catalog.partition_info`#