hipscat.catalog.association_catalog.partition_join_info#

Container class to hold primary-to-join partition metadata

Module Contents#

Classes#

PartitionJoinInfo

Association catalog metadata with which partitions matches occur in the join

class PartitionJoinInfo(join_info_df: pandas.DataFrame, catalog_base_dir: str = None)[source]#

Association catalog metadata with which partitions matches occur in the join

PRIMARY_ORDER_COLUMN_NAME = 'Norder'[source]#
PRIMARY_PIXEL_COLUMN_NAME = 'Npix'[source]#
JOIN_ORDER_COLUMN_NAME = 'join_Norder'[source]#
JOIN_PIXEL_COLUMN_NAME = 'join_Npix'[source]#
COLUMN_NAMES[source]#
_check_column_names()[source]#
primary_to_join_map() Dict[hipscat.pixel_math.healpix_pixel.HealpixPixel, List[hipscat.pixel_math.healpix_pixel.HealpixPixel]][source]#

Generate a map from a single primary pixel to one or more pixels in the join catalog.

Lots of cute comprehension is happening here, so watch out! We create tuple of (primary order/pixel) and [array of tuples of (join order/pixel)]

Returns:

dictionary mapping (primary order/pixel) to [array of (join order/pixel)]

write_to_metadata_files(catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#

Generate parquet metadata, using the known partitions.

Parameters:
  • catalog_path (FilePointer) – base path for the catalog

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Raises:

ValueError – if no path is provided, and could not be inferred.

write_to_csv(catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#

Write all partition data to CSV files.

Two files will be written:

  • partition_info.csv - covers all primary catalog pixels, and should match the file structure

  • partition_join_info.csv - covers all pairwise relationships between primary and join catalogs.

Parameters:
  • catalog_path – FilePointer to the directory where the partition_join_info.csv file will be written

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Raises:

ValueError – if no path is provided, and could not be inferred.

classmethod read_from_dir(catalog_base_dir: hipscat.io.FilePointer, storage_options: dict = None) PartitionJoinInfo[source]#

Read partition join info from a file within a hipscat directory.

This will look for a partition_join_info.csv file, and if not found, will look for a _metadata file. The second approach is typically slower for large catalogs therefore a warning is issued to the user. In internal testing with large catalogs, the first approach takes less than a second, while the second can take 10-20 seconds.

Parameters:
  • catalog_base_dir – path to the root directory of the catalog

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionJoinInfo object with the data from the file

Raises:

FileNotFoundError – if neither desired file is found in the catalog_base_dir

classmethod read_from_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) PartitionJoinInfo[source]#

Read partition join info from a _metadata file to create an object

Parameters:
  • metadata_file (FilePointer) – FilePointer to the _metadata file

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

  • strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.

Returns:

A PartitionJoinInfo object with the data from the file

classmethod _read_from_metadata_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) pandas.DataFrame[source]#

Read partition join info from a _metadata file to create an object

Parameters:
  • metadata_file (FilePointer) – FilePointer to the _metadata file

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

  • strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.

Returns:

A PartitionJoinInfo object with the data from the file

classmethod read_from_csv(partition_join_info_file: hipscat.io.FilePointer, storage_options: dict = None) PartitionJoinInfo[source]#

Read partition join info from a partition_join_info.csv file to create an object

Parameters:
  • partition_join_info_file (FilePointer) – FilePointer to the partition_join_info.csv file

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionJoinInfo object with the data from the file

classmethod _read_from_csv(partition_join_info_file: hipscat.io.FilePointer, storage_options: dict = None) pandas.DataFrame[source]#

Read partition join info from a partition_join_info.csv file to create an object

Parameters:
  • partition_join_info_file (FilePointer) – FilePointer to the partition_join_info.csv file

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionJoinInfo object with the data from the file