hipscat.catalog.association_catalog
#
Submodules#
Package Contents#
Classes#
A HiPSCat Catalog for enabling fast joins between two HiPSCat catalogs |
|
Catalog Info for a HiPSCat Association Catalog |
|
Association catalog metadata with which partitions matches occur in the join |
- class AssociationCatalog(catalog_info: CatalogInfoClass, pixels: hipscat.catalog.healpix_dataset.healpix_dataset.PixelInputTypes, join_pixels: JoinPixelInputTypes, catalog_path=None, moc: mocpy.MOC | None = None, storage_options: Dict[Any, Any] | None = None)[source]#
Bases:
hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset
A HiPSCat Catalog for enabling fast joins between two HiPSCat catalogs
Catalogs of this type are partitioned based on the partitioning of the left catalog. The partition_join_info metadata file specifies all pairs of pixels in the Association Catalog, corresponding to each pair of partitions in each catalog that contain rows to join.
- CatalogInfoClass: typing_extensions.TypeAlias#
- catalog_info: AssociationCatalog.CatalogInfoClass#
- JoinPixelInputTypes#
- get_join_pixels() pandas.DataFrame [source]#
Get join pixels listing all pairs of pixels from left and right catalogs that contain matching association rows
- Returns:
pd.DataFrame with each row being a pair of pixels from the primary and join catalogs
- static _get_partition_join_info_from_pixels(join_pixels: JoinPixelInputTypes) hipscat.catalog.association_catalog.partition_join_info.PartitionJoinInfo [source]#
- class AssociationCatalogInfo[source]#
Bases:
hipscat.catalog.dataset.base_catalog_info.BaseCatalogInfo
Catalog Info for a HiPSCat Association Catalog
- primary_catalog: str | None#
Catalog name for the primary (left) side of association
- primary_column: str | None#
Column name in the primary (left) side of join
- primary_column_association: str | None#
Column name in the association table that matches the primary (left) side of join
- join_catalog: str | None#
Catalog name for the joining (right) side of association
- join_column: str | None#
Column name in the joining (right) side of join
- join_column_association: str | None#
Column name in the association table that matches the joining (right) side of join
- contains_leaf_files: bool = False#
Whether or not the association catalog contains leaf parquet files
- required_fields#
- DEFAULT_TYPE#
- REQUIRED_TYPE#
- class PartitionJoinInfo(join_info_df: pandas.DataFrame, catalog_base_dir: str = None)[source]#
Association catalog metadata with which partitions matches occur in the join
- PRIMARY_ORDER_COLUMN_NAME = 'Norder'#
- PRIMARY_PIXEL_COLUMN_NAME = 'Npix'#
- JOIN_ORDER_COLUMN_NAME = 'join_Norder'#
- JOIN_PIXEL_COLUMN_NAME = 'join_Npix'#
- COLUMN_NAMES#
- primary_to_join_map() Dict[hipscat.pixel_math.healpix_pixel.HealpixPixel, List[hipscat.pixel_math.healpix_pixel.HealpixPixel]] [source]#
Generate a map from a single primary pixel to one or more pixels in the join catalog.
Lots of cute comprehension is happening here, so watch out! We create tuple of (primary order/pixel) and [array of tuples of (join order/pixel)]
- Returns:
dictionary mapping (primary order/pixel) to [array of (join order/pixel)]
- write_to_metadata_files(catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#
Generate parquet metadata, using the known partitions.
- Parameters:
catalog_path (FilePointer) – base path for the catalog
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Raises:
ValueError – if no path is provided, and could not be inferred.
- write_to_csv(catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#
Write all partition data to CSV files.
Two files will be written:
partition_info.csv - covers all primary catalog pixels, and should match the file structure
partition_join_info.csv - covers all pairwise relationships between primary and join catalogs.
- Parameters:
catalog_path – FilePointer to the directory where the partition_join_info.csv file will be written
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Raises:
ValueError – if no path is provided, and could not be inferred.
- classmethod read_from_dir(catalog_base_dir: hipscat.io.FilePointer, storage_options: dict = None) PartitionJoinInfo [source]#
Read partition join info from a file within a hipscat directory.
This will look for a partition_join_info.csv file, and if not found, will look for a _metadata file. The second approach is typically slower for large catalogs therefore a warning is issued to the user. In internal testing with large catalogs, the first approach takes less than a second, while the second can take 10-20 seconds.
- Parameters:
catalog_base_dir – path to the root directory of the catalog
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Returns:
A PartitionJoinInfo object with the data from the file
- Raises:
FileNotFoundError – if neither desired file is found in the catalog_base_dir
- classmethod read_from_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) PartitionJoinInfo [source]#
Read partition join info from a _metadata file to create an object
- Parameters:
metadata_file (FilePointer) – FilePointer to the _metadata file
storage_options (dict) – dictionary that contains abstract filesystem credentials
strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.
- Returns:
A PartitionJoinInfo object with the data from the file
- classmethod _read_from_metadata_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) pandas.DataFrame [source]#
Read partition join info from a _metadata file to create an object
- Parameters:
metadata_file (FilePointer) – FilePointer to the _metadata file
storage_options (dict) – dictionary that contains abstract filesystem credentials
strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.
- Returns:
A PartitionJoinInfo object with the data from the file
- classmethod read_from_csv(partition_join_info_file: hipscat.io.FilePointer, storage_options: dict = None) PartitionJoinInfo [source]#
Read partition join info from a partition_join_info.csv file to create an object
- Parameters:
partition_join_info_file (FilePointer) – FilePointer to the partition_join_info.csv file
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Returns:
A PartitionJoinInfo object with the data from the file
- classmethod _read_from_csv(partition_join_info_file: hipscat.io.FilePointer, storage_options: dict = None) pandas.DataFrame [source]#
Read partition join info from a partition_join_info.csv file to create an object
- Parameters:
partition_join_info_file (FilePointer) – FilePointer to the partition_join_info.csv file
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Returns:
A PartitionJoinInfo object with the data from the file