hipscat.catalog.association_catalog.partition_join_info
#
Container class to hold primary-to-join partition metadata
Module Contents#
Classes#
Association catalog metadata with which partitions matches occur in the join |
- class PartitionJoinInfo(join_info_df: pandas.DataFrame, catalog_base_dir: str = None)[source]#
Association catalog metadata with which partitions matches occur in the join
- primary_to_join_map() Dict[hipscat.pixel_math.healpix_pixel.HealpixPixel, List[hipscat.pixel_math.healpix_pixel.HealpixPixel]] [source]#
Generate a map from a single primary pixel to one or more pixels in the join catalog.
Lots of cute comprehension is happening here, so watch out! We create tuple of (primary order/pixel) and [array of tuples of (join order/pixel)]
- Returns:
dictionary mapping (primary order/pixel) to [array of (join order/pixel)]
- write_to_metadata_files(catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#
Generate parquet metadata, using the known partitions.
- Parameters:
catalog_path (FilePointer) – base path for the catalog
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Raises:
ValueError – if no path is provided, and could not be inferred.
- write_to_csv(catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#
Write all partition data to CSV files.
Two files will be written:
partition_info.csv - covers all primary catalog pixels, and should match the file structure
partition_join_info.csv - covers all pairwise relationships between primary and join catalogs.
- Parameters:
catalog_path – FilePointer to the directory where the partition_join_info.csv file will be written
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Raises:
ValueError – if no path is provided, and could not be inferred.
- classmethod read_from_dir(catalog_base_dir: hipscat.io.FilePointer, storage_options: dict = None) PartitionJoinInfo [source]#
Read partition join info from a file within a hipscat directory.
This will look for a partition_join_info.csv file, and if not found, will look for a _metadata file. The second approach is typically slower for large catalogs therefore a warning is issued to the user. In internal testing with large catalogs, the first approach takes less than a second, while the second can take 10-20 seconds.
- Parameters:
catalog_base_dir – path to the root directory of the catalog
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Returns:
A PartitionJoinInfo object with the data from the file
- Raises:
FileNotFoundError – if neither desired file is found in the catalog_base_dir
- classmethod read_from_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) PartitionJoinInfo [source]#
Read partition join info from a _metadata file to create an object
- Parameters:
metadata_file (FilePointer) – FilePointer to the _metadata file
storage_options (dict) – dictionary that contains abstract filesystem credentials
strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.
- Returns:
A PartitionJoinInfo object with the data from the file
- classmethod _read_from_metadata_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) pandas.DataFrame [source]#
Read partition join info from a _metadata file to create an object
- Parameters:
metadata_file (FilePointer) – FilePointer to the _metadata file
storage_options (dict) – dictionary that contains abstract filesystem credentials
strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.
- Returns:
A PartitionJoinInfo object with the data from the file
- classmethod read_from_csv(partition_join_info_file: hipscat.io.FilePointer, storage_options: dict = None) PartitionJoinInfo [source]#
Read partition join info from a partition_join_info.csv file to create an object
- Parameters:
partition_join_info_file (FilePointer) – FilePointer to the partition_join_info.csv file
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Returns:
A PartitionJoinInfo object with the data from the file
- classmethod _read_from_csv(partition_join_info_file: hipscat.io.FilePointer, storage_options: dict = None) pandas.DataFrame [source]#
Read partition join info from a partition_join_info.csv file to create an object
- Parameters:
partition_join_info_file (FilePointer) – FilePointer to the partition_join_info.csv file
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Returns:
A PartitionJoinInfo object with the data from the file