hipscat.io#

Utilities for reading and writing catalog files

Subpackages#

Submodules#

Package Contents#

Functions#

get_file_pointer_from_path(→ FilePointer)

Returns a file pointer from a path string

read_row_group_fragments(metadata_file[, storage_options])

Generator for metadata fragment row groups in a parquet metadata file.

row_group_stat_single_value(row_group, stat_key)

Convenience method to find the min and max inside a statistics dictionary,

write_parquet_metadata_for_batches(batches[, ...])

Write parquet metadata files for some pyarrow table batches.

create_hive_directory_name(base_dir, ...)

Create path pointer for a directory with hive partitioning naming.

create_hive_parquet_file_name(base_dir, ...)

Create path pointer for a single parquet with hive partitioning naming.

get_catalog_info_pointer(...)

Get file pointer to catalog_info.json metadata file

get_common_metadata_pointer(...)

Get file pointer to _common_metadata parquet metadata file

get_parquet_metadata_pointer(...)

Get file pointer to _metadata parquet metadata file

get_partition_info_pointer(...)

Get file pointer to partition_info.csv metadata file

get_point_map_file_pointer(...)

Get file pointer to point_map.fits FITS image file.

get_provenance_pointer(...)

Get file pointer to provenance_info.json metadata file

pixel_catalog_file(...)

Create path pointer for a pixel catalog file. This will not create the directory

pixel_directory(...)

Create path pointer for a pixel directory. This will not create the directory.

write_catalog_info(catalog_base_dir, dataset_info[, ...])

Write a catalog_info.json file with catalog metadata

write_parquet_metadata(catalog_path[, storage_options])

Generate parquet metadata, using the already-partitioned parquet files

write_partition_info(catalog_base_dir, ...[, ...])

Write all partition data to CSV file.

write_provenance_info(catalog_base_dir, dataset_info, ...)

Write a provenance_info.json file with all assorted catalog creation metadata

Attributes#

FilePointer

Unified type for references to files.

FilePointer[source]#

Unified type for references to files.

get_file_pointer_from_path(path: str, include_protocol: str = None) FilePointer[source]#

Returns a file pointer from a path string

read_row_group_fragments(metadata_file: str, storage_options: dict = None)[source]#

Generator for metadata fragment row groups in a parquet metadata file.

Parameters:
  • metadata_file (str) – path to _metadata file.

  • storage_options – dictionary that contains abstract filesystem credentials

row_group_stat_single_value(row_group, stat_key: str)[source]#

Convenience method to find the min and max inside a statistics dictionary, and raise an error if they’re unequal.

Parameters:
  • row_group – dataset fragment row group

  • stat_key (str) – column name of interest.

Returns:

The value of the specified row group statistic

write_parquet_metadata_for_batches(batches: List[List[pyarrow.RecordBatch]], output_path: str = None, storage_options: dict = None)[source]#

Write parquet metadata files for some pyarrow table batches. This writes the batches to a temporary parquet dataset using local storage, and generates the metadata for the partitioned catalog parquet files.

Parameters:
  • batches (List[List[pa.RecordBatch]]) – create one row group per RecordBatch, grouped into tables by the inner list.

  • output_path (str) – base path for writing out metadata files defaults to catalog_path if unspecified

  • storage_options – dictionary that contains abstract filesystem credentials

create_hive_directory_name(base_dir, partition_token_names, partition_token_values)[source]#

Create path pointer for a directory with hive partitioning naming. This will not create the directory.

The directory name will have the form of:

<catalog_base_dir>/<name_1>=<value_1>/.../<name_n>=<value_n>
Parameters:
  • catalog_base_dir (FilePointer) – base directory of the catalog (includes catalog name)

  • partition_token_names (list[string]) – list of partition name parts.

  • partition_token_values (list[string]) – list of partition values that correspond to the token name parts.

create_hive_parquet_file_name(base_dir, partition_token_names, partition_token_values)[source]#

Create path pointer for a single parquet with hive partitioning naming.

The file name will have the form of:

<catalog_base_dir>/<name_1>=<value_1>/.../<name_n>=<value_n>.parquet
Parameters:
  • catalog_base_dir (FilePointer) – base directory of the catalog (includes catalog name)

  • partition_token_names (list[string]) – list of partition name parts.

  • partition_token_values (list[string]) – list of partition values that correspond to the token name parts.

get_catalog_info_pointer(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer) hipscat.io.file_io.file_pointer.FilePointer[source]#

Get file pointer to catalog_info.json metadata file

Parameters:

catalog_base_dir – pointer to base catalog directory

Returns:

File Pointer to the catalog’s catalog_info.json file

get_common_metadata_pointer(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer) hipscat.io.file_io.file_pointer.FilePointer[source]#

Get file pointer to _common_metadata parquet metadata file

Parameters:

catalog_base_dir – pointer to base catalog directory

Returns:

File Pointer to the catalog’s _common_metadata file

get_parquet_metadata_pointer(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer) hipscat.io.file_io.file_pointer.FilePointer[source]#

Get file pointer to _metadata parquet metadata file

Parameters:

catalog_base_dir – pointer to base catalog directory

Returns:

File Pointer to the catalog’s _metadata file

get_partition_info_pointer(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer) hipscat.io.file_io.file_pointer.FilePointer[source]#

Get file pointer to partition_info.csv metadata file

Parameters:

catalog_base_dir – pointer to base catalog directory

Returns:

File Pointer to the catalog’s partition_info.csv file

get_point_map_file_pointer(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer) hipscat.io.file_io.file_pointer.FilePointer[source]#

Get file pointer to point_map.fits FITS image file.

Parameters:

catalog_base_dir – pointer to base catalog directory

Returns:

File Pointer to the catalog’s point_map.fits FITS image file.

get_provenance_pointer(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer) hipscat.io.file_io.file_pointer.FilePointer[source]#

Get file pointer to provenance_info.json metadata file

Parameters:

catalog_base_dir – pointer to base catalog directory

Returns:

File Pointer to the catalog’s provenance_info.json file

pixel_catalog_file(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer, pixel_order: int, pixel_number: int) hipscat.io.file_io.file_pointer.FilePointer[source]#

Create path pointer for a pixel catalog file. This will not create the directory or file.

The catalog file name will take the HiPS standard form of:

<catalog_base_dir>/Norder=<pixel_order>/Dir=<directory number>/Npix=<pixel_number>.parquet

Where the directory number is calculated using integer division as:

(pixel_number/10000)*10000
Parameters:
  • catalog_base_dir (FilePointer) – base directory of the catalog (includes catalog name)

  • pixel_order (int) – the healpix order of the pixel

  • pixel_number (int) – the healpix pixel

Returns:

string catalog file name

pixel_directory(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer, pixel_order: int, pixel_number: int | None = None, directory_number: int | None = None) hipscat.io.file_io.file_pointer.FilePointer[source]#

Create path pointer for a pixel directory. This will not create the directory.

One of pixel_number or directory_number is required. The directory name will take the HiPS standard form of:

<catalog_base_dir>/Norder=<pixel_order>/Dir=<directory number>

Where the directory number is calculated using integer division as:

(pixel_number/10000)*10000
Parameters:
  • catalog_base_dir (FilePointer) – base directory of the catalog (includes catalog name)

  • pixel_order (int) – the healpix order of the pixel

  • directory_number (int) – directory number

  • pixel_number (int) – the healpix pixel

Returns:

FilePointer directory name

write_catalog_info(catalog_base_dir, dataset_info, storage_options: Dict[Any, Any] | None = None)[source]#

Write a catalog_info.json file with catalog metadata

Parameters:
  • catalog_base_dir (str) – base directory for catalog, where file will be written

  • dataset_info (BaseCatalogInfo)

  • storage_options – dictionary that contains abstract filesystem credentials

write_parquet_metadata(catalog_path, storage_options: Dict[Any, Any] | None = None)[source]#

Generate parquet metadata, using the already-partitioned parquet files for this catalog

Parameters:
  • catalog_path (str) – base path for the catalog

  • storage_options – dictionary that contains abstract filesystem credentials

write_partition_info(catalog_base_dir: hipscat.io.file_io.FilePointer, destination_healpix_pixel_map: dict, storage_options: Dict[Any, Any] | None = None)[source]#

Write all partition data to CSV file.

Parameters:
  • catalog_base_dir (str) – base directory for catalog, where file will be written

  • destination_healpix_pixel_map (dict) –

    dictionary that maps the HealpixPixel to a tuple of origin pixel information:

    • 0 - the total number of rows found in this destination pixel

    • 1 - the set of indexes in histogram for the pixels at the original healpix order

  • storage_options – dictionary that contains abstract filesystem credentials

write_provenance_info(catalog_base_dir: hipscat.io.file_io.FilePointer, dataset_info, tool_args: dict, storage_options: Dict[Any, Any] | None = None)[source]#

Write a provenance_info.json file with all assorted catalog creation metadata

Parameters:
  • catalog_base_dir (str) – base directory for catalog, where file will be written

  • dataset_info (BaseCatalogInfo)

  • tool_args (dict) – dictionary of additional arguments provided by the tool creating this catalog.

  • storage_options – dictionary that contains abstract filesystem credentials