hipscat.catalog.partition_info#

Container class to hold per-partition metadata

Module Contents#

Classes#

PartitionInfo

Container class for per-partition info.

class PartitionInfo(pixel_list: List[hipscat.pixel_math.HealpixPixel], catalog_base_dir: str = None)[source]#

Container class for per-partition info.

METADATA_ORDER_COLUMN_NAME = 'Norder'[source]#
METADATA_DIR_COLUMN_NAME = 'Dir'[source]#
METADATA_PIXEL_COLUMN_NAME = 'Npix'[source]#
get_healpix_pixels() List[hipscat.pixel_math.HealpixPixel][source]#

Get healpix pixel objects for all pixels represented as partitions.

Returns:

List of HealpixPixel

get_highest_order() int[source]#

Get the highest healpix order for the dataset.

Returns:

int representing highest order.

write_to_file(partition_info_file: hipscat.io.FilePointer = None, catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#

Write all partition data to CSV file.

If no paths are provided, the catalog base directory from the read_from_dir call is used.

Parameters:
  • partition_info_file – FilePointer to where the partition_info.csv file will be written.

  • catalog_path – base directory for a catalog where the partition_info.csv file will be written.

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Raises:

ValueError – if no path is provided, and could not be inferred.

write_to_metadata_files(catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#

Generate parquet metadata, using the known partitions.

If no catalog_path is provided, the catalog base directory from the read_from_dir call is used.

Parameters:
  • catalog_path (FilePointer) – base path for the catalog

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Raises:

ValueError – if no path is provided, and could not be inferred.

classmethod read_from_dir(catalog_base_dir: hipscat.io.FilePointer, storage_options: dict = None) PartitionInfo[source]#

Read partition info from a file within a hipscat directory.

This will look for a partition_info.csv file, and if not found, will look for a _metadata file. The second approach is typically slower for large catalogs therefore a warning is issued to the user. In internal testing with large catalogs, the first approach takes less than a second, while the second can take 10-20 seconds.

Parameters:
  • catalog_base_dir – path to the root directory of the catalog

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionInfo object with the data from the file

Raises:

FileNotFoundError – if neither desired file is found in the catalog_base_dir

classmethod read_from_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) PartitionInfo[source]#

Read partition info from a _metadata file to create an object

Parameters:
  • metadata_file (FilePointer) – FilePointer to the _metadata file

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

  • strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.

Returns:

A PartitionInfo object with the data from the file

classmethod _read_from_metadata_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) List[hipscat.pixel_math.HealpixPixel][source]#

Read partition info list from a _metadata file.

Parameters:
  • metadata_file (FilePointer) – FilePointer to the _metadata file

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

  • strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.

Returns:

A PartitionInfo object with the data from the file

classmethod read_from_csv(partition_info_file: hipscat.io.FilePointer, storage_options: dict = None) PartitionInfo[source]#

Read partition info from a partition_info.csv file to create an object

Parameters:
  • partition_info_file (FilePointer) – FilePointer to the partition_info.csv file

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionInfo object with the data from the file

classmethod _read_from_csv(partition_info_file: hipscat.io.FilePointer, storage_options: dict = None) PartitionInfo[source]#

Read partition info from a partition_info.csv file to create an object

Parameters:
  • partition_info_file (FilePointer) – FilePointer to the partition_info.csv file

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionInfo object with the data from the file

as_dataframe()[source]#

Construct a pandas dataframe for the partition info pixels.

Returns:

Dataframe with order, directory, and pixel info.

classmethod from_healpix(healpix_pixels: List[hipscat.pixel_math.HealpixPixel]) PartitionInfo[source]#

Create a partition info object from a list of constituent healpix pixels.

Parameters:

healpix_pixels – list of healpix pixels

Returns:

A PartitionInfo object with the same healpix pixels