hipscat.catalog

`hipscat.catalog`#

Catalog data wrappers

Subpackages#

Submodules#

Package Contents#

Classes#

`AssociationCatalog`	A HiPSCat Catalog for enabling fast joins between two HiPSCat catalogs
`Catalog`	A HiPSCat Catalog with data stored in a HEALPix Hive partitioned structure
`CatalogType`	Enum for possible types of catalog
`Dataset`	A base HiPSCat dataset that contains a catalog_info metadata file
`MarginCatalog`	A HiPSCat Catalog used to contain the 'margin' of another HiPSCat catalog.
`PartitionInfo`	Container class for per-partition info.

class AssociationCatalog(catalog_info: CatalogInfoClass, pixels: hipscat.catalog.healpix_dataset.healpix_dataset.PixelInputTypes, join_pixels: JoinPixelInputTypes, catalog_path=None, moc: mocpy.MOC | None = None, storage_options: Dict[Any, Any] | None = None)[source]#

Bases: hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset

A HiPSCat Catalog for enabling fast joins between two HiPSCat catalogs

Catalogs of this type are partitioned based on the partitioning of the left catalog. The partition_join_info metadata file specifies all pairs of pixels in the Association Catalog, corresponding to each pair of partitions in each catalog that contain rows to join.

CatalogInfoClass: typing_extensions.TypeAlias#

catalog_info: AssociationCatalog.CatalogInfoClass#

JoinPixelInputTypes#

get_join_pixels() → pandas.DataFrame[source]#

Get join pixels listing all pairs of pixels from left and right catalogs that contain matching association rows

Returns:: pd.DataFrame with each row being a pair of pixels from the primary and join catalogs

static _get_partition_join_info_from_pixels(join_pixels: JoinPixelInputTypes) → hipscat.catalog.association_catalog.partition_join_info.PartitionJoinInfo[source]#

classmethod _read_args(catalog_base_dir: hipscat.io.FilePointer, storage_options: Dict[Any, Any] | None = None) → Tuple[CatalogInfoClass, hipscat.catalog.healpix_dataset.healpix_dataset.PixelInputTypes, JoinPixelInputTypes][source]#

classmethod _check_files_exist(catalog_base_dir: hipscat.io.FilePointer, storage_options: dict = None)[source]#

class Catalog(catalog_info: CatalogInfoClass, pixels: hipscat.catalog.healpix_dataset.healpix_dataset.PixelInputTypes, catalog_path: str = None, moc: mocpy.MOC | None = None, storage_options: Dict[Any, Any] | None = None)[source]#

Bases: hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset

A HiPSCat Catalog with data stored in a HEALPix Hive partitioned structure

Catalogs of this type are partitioned spatially, contain partition_info metadata specifying the pixels in Catalog, and on disk conform to the parquet partitioning structure Norder=/Dir=/Npix=.parquet

HIPS_CATALOG_TYPES#

CatalogInfoClass: typing_extensions.TypeAlias#

catalog_info: Catalog.CatalogInfoClass#

filter_by_cone(ra: float, dec: float, radius_arcsec: float) → Catalog[source]#

Filter the pixels in the catalog to only include the pixels that overlap with a cone

Parameters:

ra (float) – Right Ascension of the center of the cone in degrees
dec (float) – Declination of the center of the cone in degrees
radius_arcsec (float) – Radius of the cone in arcseconds

Returns:

A new catalog with only the pixels that overlap with the specified cone

filter_by_box(ra: Tuple[float, float] | None = None, dec: Tuple[float, float] | None = None) → Catalog[source]#

Filter the pixels in the catalog to only include the pixels that overlap with a right ascension or declination range. In case both ranges are provided, filtering is performed using a polygon.

Parameters:

ra (Tuple[float, float]) – Right ascension range, in degrees
dec (Tuple[float, float]) – Declination range, in degrees

Returns:

A new catalog with only the pixels that overlap with the specified region

filter_by_polygon(vertices: List[hipscat.pixel_math.polygon_filter.SphericalCoordinates] | List[hipscat.pixel_math.polygon_filter.CartesianCoordinates]) → Catalog[source]#

Filter the pixels in the catalog to only include the pixels that overlap with a polygonal sky region.

Parameters:: vertices (List[SphericalCoordinates] | List[CartesianCoordinates]) – The vertices of the polygon to filter points with, in lists of (ra,dec) or (x,y,z) points on the unit sphere.
Returns:: A new catalog with only the pixels that overlap with the specified polygon.

generate_negative_tree_pixels() → List[hipscat.pixel_math.HealpixPixel][source]#

Get the leaf nodes at each healpix order that have zero catalog data.

For example, if an example catalog only had data points in pixel 0 at order 0, then this method would return order 0’s pixels 1 through 11. Used for getting full coverage on margin caches.

Returns:: List of HealpixPixels representing the ‘negative tree’ for the catalog.

class CatalogType[source]#

Bases: str, enum.Enum

Enum for possible types of catalog

OBJECT = 'object'#

SOURCE = 'source'#

ASSOCIATION = 'association'#

INDEX = 'index'#

MARGIN = 'margin'#

classmethod all_types()[source]#: Fetch a list of all catalog types

class Dataset(catalog_info: CatalogInfoClass, catalog_path=None, storage_options: Dict[Any, Any] | None = None)[source]#

A base HiPSCat dataset that contains a catalog_info metadata file and the data contained in parquet files

CatalogInfoClass#

classmethod read_from_hipscat(catalog_path: str, storage_options: Dict[Any, Any] | None = None) → typing_extensions.Self[source]#

Reads a HiPSCat Catalog from a HiPSCat directory

Parameters:

catalog_path – path to the root directory of the catalog
storage_options – dictionary that contains abstract filesystem credentials

Returns:

The initialized catalog object

classmethod _read_args(catalog_base_dir: hipscat.io.FilePointer, storage_options: Dict[Any, Any] | None = None) → Tuple[CatalogInfoClass][source]#

classmethod _read_kwargs(catalog_base_dir: hipscat.io.FilePointer, storage_options: Dict[Any, Any] | None = None) → dict[source]#

classmethod _check_files_exist(catalog_base_dir: hipscat.io.FilePointer, storage_options: Dict[Any, Any] | None = None)[source]#

class MarginCatalog(catalog_info: CatalogInfoClass, pixels: hipscat.catalog.healpix_dataset.healpix_dataset.PixelInputTypes, catalog_path: str = None, moc: mocpy.MOC | None = None, storage_options: dict | None = None)[source]#

Bases: hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset

A HiPSCat Catalog used to contain the ‘margin’ of another HiPSCat catalog.

Catalogs of this type are used alongside a primary catalog, and contains the margin points for each HEALPix pixel - any points that are within a certain distance from the HEALPix pixel boundary. This is used to ensure spatial operations such as crossmatching can be performed efficiently while maintaining accuracy.

CatalogInfoClass: typing_extensions.TypeAlias#

catalog_info: MarginCatalog.CatalogInfoClass#

class PartitionInfo(pixel_list: List[hipscat.pixel_math.HealpixPixel], catalog_base_dir: str = None)[source]#

Container class for per-partition info.

METADATA_ORDER_COLUMN_NAME = 'Norder'#

METADATA_DIR_COLUMN_NAME = 'Dir'#

METADATA_PIXEL_COLUMN_NAME = 'Npix'#

get_healpix_pixels() → List[hipscat.pixel_math.HealpixPixel][source]#

Get healpix pixel objects for all pixels represented as partitions.

Returns:: List of HealpixPixel

get_highest_order() → int[source]#

Get the highest healpix order for the dataset.

Returns:: int representing highest order.

write_to_file(partition_info_file: hipscat.io.FilePointer = None, catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#

Write all partition data to CSV file.

If no paths are provided, the catalog base directory from the read_from_dir call is used.

Parameters:

partition_info_file – FilePointer to where the partition_info.csv file will be written.
catalog_path – base directory for a catalog where the partition_info.csv file will be written.
storage_options (dict) – dictionary that contains abstract filesystem credentials

Raises:

ValueError – if no path is provided, and could not be inferred.

write_to_metadata_files(catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#

Generate parquet metadata, using the known partitions.

If no catalog_path is provided, the catalog base directory from the read_from_dir call is used.

Parameters:

catalog_path (FilePointer) – base path for the catalog
storage_options (dict) – dictionary that contains abstract filesystem credentials

Raises:

ValueError – if no path is provided, and could not be inferred.

classmethod read_from_dir(catalog_base_dir: hipscat.io.FilePointer, storage_options: dict = None) → PartitionInfo[source]#

Read partition info from a file within a hipscat directory.

This will look for a partition_info.csv file, and if not found, will look for a _metadata file. The second approach is typically slower for large catalogs therefore a warning is issued to the user. In internal testing with large catalogs, the first approach takes less than a second, while the second can take 10-20 seconds.

Parameters:

catalog_base_dir – path to the root directory of the catalog
storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionInfo object with the data from the file

Raises:

FileNotFoundError – if neither desired file is found in the catalog_base_dir

classmethod read_from_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) → PartitionInfo[source]#

Read partition info from a _metadata file to create an object

Parameters:

metadata_file (FilePointer) – FilePointer to the _metadata file
storage_options (dict) – dictionary that contains abstract filesystem credentials
strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.

Returns:

A PartitionInfo object with the data from the file

classmethod _read_from_metadata_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) → List[hipscat.pixel_math.HealpixPixel][source]#

Read partition info list from a _metadata file.

Parameters:

metadata_file (FilePointer) – FilePointer to the _metadata file
storage_options (dict) – dictionary that contains abstract filesystem credentials
strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.

Returns:

A PartitionInfo object with the data from the file

classmethod read_from_csv(partition_info_file: hipscat.io.FilePointer, storage_options: dict = None) → PartitionInfo[source]#

Read partition info from a partition_info.csv file to create an object

Parameters:

partition_info_file (FilePointer) – FilePointer to the partition_info.csv file
storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionInfo object with the data from the file

classmethod _read_from_csv(partition_info_file: hipscat.io.FilePointer, storage_options: dict = None) → PartitionInfo[source]#

Read partition info from a partition_info.csv file to create an object

Parameters:

partition_info_file (FilePointer) – FilePointer to the partition_info.csv file
storage_options (dict) – dictionary that contains abstract filesystem credentials

Returns:

A PartitionInfo object with the data from the file

as_dataframe()[source]#

Construct a pandas dataframe for the partition info pixels.

Returns:: Dataframe with order, directory, and pixel info.

classmethod from_healpix(healpix_pixels: List[hipscat.pixel_math.HealpixPixel]) → PartitionInfo[source]#

Create a partition info object from a list of constituent healpix pixels.

Parameters:: healpix_pixels – list of healpix pixels
Returns:: A PartitionInfo object with the same healpix pixels

hipscat.catalog

Contents

hipscat.catalog#

Subpackages#

Submodules#

Package Contents#

Classes#

`hipscat.catalog`#