hipscat.catalog
#
Catalog data wrappers
Subpackages#
Submodules#
Package Contents#
Classes#
A HiPSCat Catalog for enabling fast joins between two HiPSCat catalogs |
|
A HiPSCat Catalog with data stored in a HEALPix Hive partitioned structure |
|
Enum for possible types of catalog |
|
A base HiPSCat dataset that contains a catalog_info metadata file |
|
A HiPSCat Catalog used to contain the 'margin' of another HiPSCat catalog. |
|
Container class for per-partition info. |
- class AssociationCatalog(catalog_info: CatalogInfoClass, pixels: hipscat.catalog.healpix_dataset.healpix_dataset.PixelInputTypes, join_pixels: JoinPixelInputTypes, catalog_path=None, moc: mocpy.MOC | None = None, storage_options: Dict[Any, Any] | None = None)[source]#
Bases:
hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset
A HiPSCat Catalog for enabling fast joins between two HiPSCat catalogs
Catalogs of this type are partitioned based on the partitioning of the left catalog. The partition_join_info metadata file specifies all pairs of pixels in the Association Catalog, corresponding to each pair of partitions in each catalog that contain rows to join.
- CatalogInfoClass: typing_extensions.TypeAlias#
- catalog_info: AssociationCatalog.CatalogInfoClass#
- JoinPixelInputTypes#
- get_join_pixels() pandas.DataFrame [source]#
Get join pixels listing all pairs of pixels from left and right catalogs that contain matching association rows
- Returns:
pd.DataFrame with each row being a pair of pixels from the primary and join catalogs
- static _get_partition_join_info_from_pixels(join_pixels: JoinPixelInputTypes) hipscat.catalog.association_catalog.partition_join_info.PartitionJoinInfo [source]#
- class Catalog(catalog_info: CatalogInfoClass, pixels: hipscat.catalog.healpix_dataset.healpix_dataset.PixelInputTypes, catalog_path: str = None, moc: mocpy.MOC | None = None, storage_options: Dict[Any, Any] | None = None)[source]#
Bases:
hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset
A HiPSCat Catalog with data stored in a HEALPix Hive partitioned structure
Catalogs of this type are partitioned spatially, contain partition_info metadata specifying the pixels in Catalog, and on disk conform to the parquet partitioning structure Norder=/Dir=/Npix=.parquet
- HIPS_CATALOG_TYPES#
- CatalogInfoClass: typing_extensions.TypeAlias#
- catalog_info: Catalog.CatalogInfoClass#
- filter_by_cone(ra: float, dec: float, radius_arcsec: float) Catalog [source]#
Filter the pixels in the catalog to only include the pixels that overlap with a cone
- Parameters:
ra (float) – Right Ascension of the center of the cone in degrees
dec (float) – Declination of the center of the cone in degrees
radius_arcsec (float) – Radius of the cone in arcseconds
- Returns:
A new catalog with only the pixels that overlap with the specified cone
- filter_by_box(ra: Tuple[float, float] | None = None, dec: Tuple[float, float] | None = None) Catalog [source]#
Filter the pixels in the catalog to only include the pixels that overlap with a right ascension or declination range. In case both ranges are provided, filtering is performed using a polygon.
- Parameters:
ra (Tuple[float, float]) – Right ascension range, in degrees
dec (Tuple[float, float]) – Declination range, in degrees
- Returns:
A new catalog with only the pixels that overlap with the specified region
- filter_by_polygon(vertices: List[hipscat.pixel_math.polygon_filter.SphericalCoordinates] | List[hipscat.pixel_math.polygon_filter.CartesianCoordinates]) Catalog [source]#
Filter the pixels in the catalog to only include the pixels that overlap with a polygonal sky region.
- Parameters:
vertices (List[SphericalCoordinates] | List[CartesianCoordinates]) – The vertices of the polygon to filter points with, in lists of (ra,dec) or (x,y,z) points on the unit sphere.
- Returns:
A new catalog with only the pixels that overlap with the specified polygon.
- generate_negative_tree_pixels() List[hipscat.pixel_math.HealpixPixel] [source]#
Get the leaf nodes at each healpix order that have zero catalog data.
For example, if an example catalog only had data points in pixel 0 at order 0, then this method would return order 0’s pixels 1 through 11. Used for getting full coverage on margin caches.
- Returns:
List of HealpixPixels representing the ‘negative tree’ for the catalog.
- class CatalogType[source]#
Bases:
str
,enum.Enum
Enum for possible types of catalog
- OBJECT = 'object'#
- SOURCE = 'source'#
- ASSOCIATION = 'association'#
- INDEX = 'index'#
- MARGIN = 'margin'#
- class Dataset(catalog_info: CatalogInfoClass, catalog_path=None, storage_options: Dict[Any, Any] | None = None)[source]#
A base HiPSCat dataset that contains a catalog_info metadata file and the data contained in parquet files
- CatalogInfoClass#
- classmethod read_from_hipscat(catalog_path: str, storage_options: Dict[Any, Any] | None = None) typing_extensions.Self [source]#
Reads a HiPSCat Catalog from a HiPSCat directory
- Parameters:
catalog_path – path to the root directory of the catalog
storage_options – dictionary that contains abstract filesystem credentials
- Returns:
The initialized catalog object
- classmethod _read_args(catalog_base_dir: hipscat.io.FilePointer, storage_options: Dict[Any, Any] | None = None) Tuple[CatalogInfoClass] [source]#
- class MarginCatalog(catalog_info: CatalogInfoClass, pixels: hipscat.catalog.healpix_dataset.healpix_dataset.PixelInputTypes, catalog_path: str = None, moc: mocpy.MOC | None = None, storage_options: dict | None = None)[source]#
Bases:
hipscat.catalog.healpix_dataset.healpix_dataset.HealpixDataset
A HiPSCat Catalog used to contain the ‘margin’ of another HiPSCat catalog.
Catalogs of this type are used alongside a primary catalog, and contains the margin points for each HEALPix pixel - any points that are within a certain distance from the HEALPix pixel boundary. This is used to ensure spatial operations such as crossmatching can be performed efficiently while maintaining accuracy.
- CatalogInfoClass: typing_extensions.TypeAlias#
- catalog_info: MarginCatalog.CatalogInfoClass#
- class PartitionInfo(pixel_list: List[hipscat.pixel_math.HealpixPixel], catalog_base_dir: str = None)[source]#
Container class for per-partition info.
- METADATA_ORDER_COLUMN_NAME = 'Norder'#
- METADATA_DIR_COLUMN_NAME = 'Dir'#
- METADATA_PIXEL_COLUMN_NAME = 'Npix'#
- get_healpix_pixels() List[hipscat.pixel_math.HealpixPixel] [source]#
Get healpix pixel objects for all pixels represented as partitions.
- Returns:
List of HealpixPixel
- get_highest_order() int [source]#
Get the highest healpix order for the dataset.
- Returns:
int representing highest order.
- write_to_file(partition_info_file: hipscat.io.FilePointer = None, catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#
Write all partition data to CSV file.
If no paths are provided, the catalog base directory from the read_from_dir call is used.
- Parameters:
partition_info_file – FilePointer to where the partition_info.csv file will be written.
catalog_path – base directory for a catalog where the partition_info.csv file will be written.
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Raises:
ValueError – if no path is provided, and could not be inferred.
- write_to_metadata_files(catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#
Generate parquet metadata, using the known partitions.
If no catalog_path is provided, the catalog base directory from the read_from_dir call is used.
- Parameters:
catalog_path (FilePointer) – base path for the catalog
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Raises:
ValueError – if no path is provided, and could not be inferred.
- classmethod read_from_dir(catalog_base_dir: hipscat.io.FilePointer, storage_options: dict = None) PartitionInfo [source]#
Read partition info from a file within a hipscat directory.
This will look for a partition_info.csv file, and if not found, will look for a _metadata file. The second approach is typically slower for large catalogs therefore a warning is issued to the user. In internal testing with large catalogs, the first approach takes less than a second, while the second can take 10-20 seconds.
- Parameters:
catalog_base_dir – path to the root directory of the catalog
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Returns:
A PartitionInfo object with the data from the file
- Raises:
FileNotFoundError – if neither desired file is found in the catalog_base_dir
- classmethod read_from_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) PartitionInfo [source]#
Read partition info from a _metadata file to create an object
- Parameters:
metadata_file (FilePointer) – FilePointer to the _metadata file
storage_options (dict) – dictionary that contains abstract filesystem credentials
strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.
- Returns:
A PartitionInfo object with the data from the file
- classmethod _read_from_metadata_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) List[hipscat.pixel_math.HealpixPixel] [source]#
Read partition info list from a _metadata file.
- Parameters:
metadata_file (FilePointer) – FilePointer to the _metadata file
storage_options (dict) – dictionary that contains abstract filesystem credentials
strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.
- Returns:
A PartitionInfo object with the data from the file
- classmethod read_from_csv(partition_info_file: hipscat.io.FilePointer, storage_options: dict = None) PartitionInfo [source]#
Read partition info from a partition_info.csv file to create an object
- Parameters:
partition_info_file (FilePointer) – FilePointer to the partition_info.csv file
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Returns:
A PartitionInfo object with the data from the file
- classmethod _read_from_csv(partition_info_file: hipscat.io.FilePointer, storage_options: dict = None) PartitionInfo [source]#
Read partition info from a partition_info.csv file to create an object
- Parameters:
partition_info_file (FilePointer) – FilePointer to the partition_info.csv file
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Returns:
A PartitionInfo object with the data from the file
- as_dataframe()[source]#
Construct a pandas dataframe for the partition info pixels.
- Returns:
Dataframe with order, directory, and pixel info.
- classmethod from_healpix(healpix_pixels: List[hipscat.pixel_math.HealpixPixel]) PartitionInfo [source]#
Create a partition info object from a list of constituent healpix pixels.
- Parameters:
healpix_pixels – list of healpix pixels
- Returns:
A PartitionInfo object with the same healpix pixels