hipscat.catalog#

Catalog data wrappers

Subpackages#

Submodules#

Package Contents#

Classes#

AssociationCatalog

A HiPSCat Catalog for enabling fast joins between two HiPSCat catalogs

Catalog

A HiPSCat Catalog with data stored in a HEALPix Hive partitioned structure

CatalogType

Enum for possible types of catalog

Dataset

A base HiPSCat dataset that contains a catalog_info metadata file

MarginCatalog

A HiPSCat Catalog used to contain the 'margin' of another HiPSCat catalog.

PartitionInfo

Container class for per-partition info.

class AssociationCatalog(catalog_info: CatalogInfoClass, pixels: hipscat.catalog.healpix_dataset.healpix_dataset.PixelInputTypes, join_pixels: JoinPixelInputTypes, catalog_path=None, moc: mocpy.MOC | None = None, storage_options: Dict[Any, Any] | None = None)[source]#

Bases: hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset

A HiPSCat Catalog for enabling fast joins between two HiPSCat catalogs

Catalogs of this type are partitioned based on the partitioning of the left catalog. The partition_join_info metadata file specifies all pairs of pixels in the Association Catalog, corresponding to each pair of partitions in each catalog that contain rows to join.

CatalogInfoClass: typing_extensions.TypeAlias#
catalog_info: AssociationCatalog.CatalogInfoClass#
JoinPixelInputTypes#
get_join_pixels() pandas.DataFrame[source]#

Get join pixels listing all pairs of pixels from left and right catalogs that contain matching association rows

Returns:

pd.DataFrame with each row being a pair of pixels from the primary and join catalogs

static _get_partition_join_info_from_pixels(join_pixels: JoinPixelInputTypes) hipscat.catalog.association_catalog.partition_join_info.PartitionJoinInfo[source]#
classmethod _read_args(catalog_base_dir: hipscat.io.FilePointer, storage_options: Dict[Any, Any] | None = None) Tuple[CatalogInfoClass, hipscat.catalog.healpix_dataset.healpix_dataset.PixelInputTypes, JoinPixelInputTypes][source]#
classmethod _check_files_exist(catalog_base_dir: hipscat.io.FilePointer, storage_options: dict = None)[source]#
class Catalog(catalog_info: CatalogInfoClass, pixels: hipscat.catalog.healpix_dataset.healpix_dataset.PixelInputTypes, catalog_path: str = None, moc: mocpy.MOC | None = None, storage_options: Dict[Any, Any] | None = None)[source]#

Bases: hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset

A HiPSCat Catalog with data stored in a HEALPix Hive partitioned structure

Catalogs of this type are partitioned spatially, contain partition_info metadata specifying the pixels in Catalog, and on disk conform to the parquet partitioning structure Norder=/Dir=/Npix=.parquet

HIPS_CATALOG_TYPES#
CatalogInfoClass: typing_extensions.TypeAlias#
catalog_info: Catalog.CatalogInfoClass#
filter_by_cone(ra: float, dec: float, radius_arcsec: float) Catalog[source]#

Filter the pixels in the catalog to only include the pixels that overlap with a cone

Parameters:
  • ra (float) – Right Ascension of the center of the cone in degrees

  • dec (float) – Declination of the center of the cone in degrees

  • radius_arcsec (float) – Radius of the cone in arcseconds

Returns:

A new catalog with only the pixels that overlap with the specified cone

filter_by_box(ra: Tuple[float, float] | None = None, dec: Tuple[float, float] | None = None) Catalog[source]#

Filter the pixels in the catalog to only include the pixels that overlap with a right ascension or declination range. In case both ranges are provided, filtering is performed using a polygon.

Parameters:
  • ra (Tuple[float, float]) – Right ascension range, in degrees

  • dec (Tuple[float, float]) – Declination range, in degrees

Returns:

A new catalog with only the pixels that overlap with the specified region

filter_by_polygon(vertices: List[hipscat.pixel_math.polygon_filter.SphericalCoordinates] | List[hipscat.pixel_math.polygon_filter.CartesianCoordinates]) Catalog[source]#

Filter the pixels in the catalog to only include the pixels that overlap with a polygonal sky region.

Parameters:

vertices (List[SphericalCoordinates] | List[CartesianCoordinates]) – The vertices of the polygon to filter points with, in lists of (ra,dec) or (x,y,z) points on the unit sphere.

Returns:

A new catalog with only the pixels that overlap with the specified polygon.

generate_negative_tree_pixels() List[hipscat.pixel_math.HealpixPixel][source]#

Get the leaf nodes at each healpix order that have zero catalog data.

For example, if an example catalog only had data points in pixel 0 at order 0, then this method would return order 0’s pixels 1 through 11. Used for getting full coverage on margin caches.

Returns:

List of HealpixPixels representing the ‘negative tree’ for the catalog.

class CatalogType[source]#

Bases: str, enum.Enum

Enum for possible types of catalog

OBJECT = 'object'#
SOURCE = 'source'#
ASSOCIATION = 'association'#
INDEX = 'index'#
MARGIN = 'margin'#
classmethod all_types()[source]#

Fetch a list of all catalog types

class Dataset(catalog_info: CatalogInfoClass, catalog_path=None, storage_options: Dict[Any, Any] | None = None)[source]#

A base HiPSCat dataset that contains a catalog_info metadata file and the data contained in parquet files

CatalogInfoClass#
classmethod read_from_hipscat(catalog_path: str, storage_options: Dict[Any, Any] | None = None) typing_extensions.Self[source]#

Reads a HiPSCat Catalog from a HiPSCat directory

Parameters:
  • catalog_path – path to the root directory of the catalog

  • storage_options – dictionary that contains abstract filesystem credentials

Returns:

The initialized catalog object

classmethod _read_args(catalog_base_dir: hipscat.io.FilePointer, storage_options: Dict[Any, Any] | None = None) Tuple[CatalogInfoClass][source]#
classmethod _read_kwargs(catalog_base_dir: hipscat.io.FilePointer, storage_options: Dict[Any, Any] | None = None) dict[source]#
classmethod _check_files_exist(catalog_base_dir: hipscat.io.FilePointer, storage_options: Dict[Any, Any] | None = None)[source]#
class MarginCatalog(catalog_info: CatalogInfoClass, pixels: hipscat.catalog.healpix_dataset.healpix_dataset.PixelInputTypes, catalog_path: str = None, moc: mocpy.MOC | None = None, storage_options: dict | None = None)[source]#

Bases: hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset

A HiPSCat Catalog used to contain the ‘margin’ of another HiPSCat catalog.

Catalogs of this type are used alongside a primary catalog, and contains the margin points for each HEALPix pixel - any points that are within a certain distance from the HEALPix pixel boundary. This is used to ensure spatial operations such as crossmatching can be performed efficiently while maintaining accuracy.

CatalogInfoClass: typing_extensions.TypeAlias#
catalog_info: MarginCatalog.CatalogInfoClass#
class PartitionInfo(pixel_list: List[hipscat.pixel_math.HealpixPixel], catalog_base_dir: str = None)[source]#

Container class for per-partition info.

METADATA_ORDER_COLUMN_NAME = 'Norder'#
METADATA_DIR_COLUMN_NAME = 'Dir'#
METADATA_PIXEL_COLUMN_NAME = 'Npix'#
get_healpix_pixels() List[hipscat.pixel_math.HealpixPixel][source]#

Get healpix pixel objects for all pixels represented as partitions.

Returns:

List of HealpixPixel

get_highest_order() int[source]#

Get the highest healpix order for the dataset.

Returns:

int representing highest order.

write_to_file(partition_info_file: hipscat.io.FilePointer = None, catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#

Write all partition data to CSV file.

If no paths are provided, the catalog base directory from the read_from_dir call is used.

Parameters:
  • partition_info_file – FilePointer to where the partition_info.csv file will be written.

  • catalog_path – base directory for a catalog where the partition_info.csv file will be written.

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Raises:

ValueError – if no path is provided, and could not be inferred.

write_to_metadata_files(catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#

Generate parquet metadata, using the known partitions.

If no catalog_path is provided, the catalog base directory from the read_from_dir call is used.

Parameters:
  • catalog_path (FilePointer) – base path for the catalog

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Raises:

ValueError – if no path is provided, and could not be inferred.

classmethod read_from_dir(catalog_base_dir: hipscat.io.FilePointer, storage_options: dict = None) PartitionInfo[source]#

Read partition info from a file within a hipscat directory.

This will look for a partition_info.csv file, and if not found, will look for a _metadata file. The second approach is typically slower for large catalogs therefore a warning is issued to the user. In internal testing with large catalogs, the first approach takes less than a second, while the second can take 10-20 seconds.

Parameters:
  • catalog_base_dir – path to the root directory of the catalog

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionInfo object with the data from the file

Raises:

FileNotFoundError – if neither desired file is found in the catalog_base_dir

classmethod read_from_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) PartitionInfo[source]#

Read partition info from a _metadata file to create an object

Parameters:
  • metadata_file (FilePointer) – FilePointer to the _metadata file

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

  • strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.

Returns:

A PartitionInfo object with the data from the file

classmethod _read_from_metadata_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) List[hipscat.pixel_math.HealpixPixel][source]#

Read partition info list from a _metadata file.

Parameters:
  • metadata_file (FilePointer) – FilePointer to the _metadata file

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

  • strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.

Returns:

A PartitionInfo object with the data from the file

classmethod read_from_csv(partition_info_file: hipscat.io.FilePointer, storage_options: dict = None) PartitionInfo[source]#

Read partition info from a partition_info.csv file to create an object

Parameters:
  • partition_info_file (FilePointer) – FilePointer to the partition_info.csv file

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionInfo object with the data from the file

classmethod _read_from_csv(partition_info_file: hipscat.io.FilePointer, storage_options: dict = None) PartitionInfo[source]#

Read partition info from a partition_info.csv file to create an object

Parameters:
  • partition_info_file (FilePointer) – FilePointer to the partition_info.csv file

  • storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionInfo object with the data from the file

as_dataframe()[source]#

Construct a pandas dataframe for the partition info pixels.

Returns:

Dataframe with order, directory, and pixel info.

classmethod from_healpix(healpix_pixels: List[hipscat.pixel_math.HealpixPixel]) PartitionInfo[source]#

Create a partition info object from a list of constituent healpix pixels.

Parameters:

healpix_pixels – list of healpix pixels

Returns:

A PartitionInfo object with the same healpix pixels