hipscat.io
#
Utilities for reading and writing catalog files
Subpackages#
Submodules#
Package Contents#
Functions#
|
Returns a file pointer from a path string |
|
Generator for metadata fragment row groups in a parquet metadata file. |
|
Convenience method to find the min and max inside a statistics dictionary, |
|
Write parquet metadata files for some pyarrow table batches. |
|
Create path pointer for a directory with hive partitioning naming. |
|
Create path pointer for a single parquet with hive partitioning naming. |
Get file pointer to catalog_info.json metadata file |
|
Get file pointer to _common_metadata parquet metadata file |
|
Get file pointer to _metadata parquet metadata file |
|
Get file pointer to partition_info.csv metadata file |
|
Get file pointer to point_map.fits FITS image file. |
|
Get file pointer to provenance_info.json metadata file |
|
|
Create path pointer for a pixel catalog file. This will not create the directory |
|
Create path pointer for a pixel directory. This will not create the directory. |
|
Write a catalog_info.json file with catalog metadata |
|
Generate parquet metadata, using the already-partitioned parquet files |
|
Write all partition data to CSV file. |
|
Write a provenance_info.json file with all assorted catalog creation metadata |
Attributes#
Unified type for references to files. |
- get_file_pointer_from_path(path: str, include_protocol: str = None) FilePointer [source]#
Returns a file pointer from a path string
- read_row_group_fragments(metadata_file: str, storage_options: dict = None)[source]#
Generator for metadata fragment row groups in a parquet metadata file.
- Parameters:
metadata_file (str) – path to _metadata file.
storage_options – dictionary that contains abstract filesystem credentials
- row_group_stat_single_value(row_group, stat_key: str)[source]#
Convenience method to find the min and max inside a statistics dictionary, and raise an error if they’re unequal.
- Parameters:
row_group – dataset fragment row group
stat_key (str) – column name of interest.
- Returns:
The value of the specified row group statistic
- write_parquet_metadata_for_batches(batches: List[List[pyarrow.RecordBatch]], output_path: str = None, storage_options: dict = None)[source]#
Write parquet metadata files for some pyarrow table batches. This writes the batches to a temporary parquet dataset using local storage, and generates the metadata for the partitioned catalog parquet files.
- Parameters:
batches (List[List[pa.RecordBatch]]) – create one row group per RecordBatch, grouped into tables by the inner list.
output_path (str) – base path for writing out metadata files defaults to catalog_path if unspecified
storage_options – dictionary that contains abstract filesystem credentials
- create_hive_directory_name(base_dir, partition_token_names, partition_token_values)[source]#
Create path pointer for a directory with hive partitioning naming. This will not create the directory.
The directory name will have the form of:
<catalog_base_dir>/<name_1>=<value_1>/.../<name_n>=<value_n>
- Parameters:
catalog_base_dir (FilePointer) – base directory of the catalog (includes catalog name)
partition_token_names (list[string]) – list of partition name parts.
partition_token_values (list[string]) – list of partition values that correspond to the token name parts.
- create_hive_parquet_file_name(base_dir, partition_token_names, partition_token_values)[source]#
Create path pointer for a single parquet with hive partitioning naming.
The file name will have the form of:
<catalog_base_dir>/<name_1>=<value_1>/.../<name_n>=<value_n>.parquet
- Parameters:
catalog_base_dir (FilePointer) – base directory of the catalog (includes catalog name)
partition_token_names (list[string]) – list of partition name parts.
partition_token_values (list[string]) – list of partition values that correspond to the token name parts.
- get_catalog_info_pointer(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer) hipscat.io.file_io.file_pointer.FilePointer [source]#
Get file pointer to catalog_info.json metadata file
- Parameters:
catalog_base_dir – pointer to base catalog directory
- Returns:
File Pointer to the catalog’s catalog_info.json file
- get_common_metadata_pointer(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer) hipscat.io.file_io.file_pointer.FilePointer [source]#
Get file pointer to _common_metadata parquet metadata file
- Parameters:
catalog_base_dir – pointer to base catalog directory
- Returns:
File Pointer to the catalog’s _common_metadata file
- get_parquet_metadata_pointer(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer) hipscat.io.file_io.file_pointer.FilePointer [source]#
Get file pointer to _metadata parquet metadata file
- Parameters:
catalog_base_dir – pointer to base catalog directory
- Returns:
File Pointer to the catalog’s _metadata file
- get_partition_info_pointer(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer) hipscat.io.file_io.file_pointer.FilePointer [source]#
Get file pointer to partition_info.csv metadata file
- Parameters:
catalog_base_dir – pointer to base catalog directory
- Returns:
File Pointer to the catalog’s partition_info.csv file
- get_point_map_file_pointer(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer) hipscat.io.file_io.file_pointer.FilePointer [source]#
Get file pointer to point_map.fits FITS image file.
- Parameters:
catalog_base_dir – pointer to base catalog directory
- Returns:
File Pointer to the catalog’s point_map.fits FITS image file.
- get_provenance_pointer(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer) hipscat.io.file_io.file_pointer.FilePointer [source]#
Get file pointer to provenance_info.json metadata file
- Parameters:
catalog_base_dir – pointer to base catalog directory
- Returns:
File Pointer to the catalog’s provenance_info.json file
- pixel_catalog_file(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer, pixel_order: int, pixel_number: int) hipscat.io.file_io.file_pointer.FilePointer [source]#
Create path pointer for a pixel catalog file. This will not create the directory or file.
The catalog file name will take the HiPS standard form of:
<catalog_base_dir>/Norder=<pixel_order>/Dir=<directory number>/Npix=<pixel_number>.parquet
Where the directory number is calculated using integer division as:
(pixel_number/10000)*10000
- Parameters:
catalog_base_dir (FilePointer) – base directory of the catalog (includes catalog name)
pixel_order (int) – the healpix order of the pixel
pixel_number (int) – the healpix pixel
- Returns:
string catalog file name
- pixel_directory(catalog_base_dir: hipscat.io.file_io.file_pointer.FilePointer, pixel_order: int, pixel_number: int | None = None, directory_number: int | None = None) hipscat.io.file_io.file_pointer.FilePointer [source]#
Create path pointer for a pixel directory. This will not create the directory.
One of pixel_number or directory_number is required. The directory name will take the HiPS standard form of:
<catalog_base_dir>/Norder=<pixel_order>/Dir=<directory number>
Where the directory number is calculated using integer division as:
(pixel_number/10000)*10000
- Parameters:
catalog_base_dir (FilePointer) – base directory of the catalog (includes catalog name)
pixel_order (int) – the healpix order of the pixel
directory_number (int) – directory number
pixel_number (int) – the healpix pixel
- Returns:
FilePointer directory name
- write_catalog_info(catalog_base_dir, dataset_info, storage_options: Dict[Any, Any] | None = None)[source]#
Write a catalog_info.json file with catalog metadata
- Parameters:
catalog_base_dir (str) – base directory for catalog, where file will be written
dataset_info (
BaseCatalogInfo
)storage_options – dictionary that contains abstract filesystem credentials
- write_parquet_metadata(catalog_path, storage_options: Dict[Any, Any] | None = None)[source]#
Generate parquet metadata, using the already-partitioned parquet files for this catalog
- Parameters:
catalog_path (str) – base path for the catalog
storage_options – dictionary that contains abstract filesystem credentials
- write_partition_info(catalog_base_dir: hipscat.io.file_io.FilePointer, destination_healpix_pixel_map: dict, storage_options: Dict[Any, Any] | None = None)[source]#
Write all partition data to CSV file.
- Parameters:
catalog_base_dir (str) – base directory for catalog, where file will be written
destination_healpix_pixel_map (dict) –
dictionary that maps the HealpixPixel to a tuple of origin pixel information:
0 - the total number of rows found in this destination pixel
1 - the set of indexes in histogram for the pixels at the original healpix order
storage_options – dictionary that contains abstract filesystem credentials
- write_provenance_info(catalog_base_dir: hipscat.io.file_io.FilePointer, dataset_info, tool_args: dict, storage_options: Dict[Any, Any] | None = None)[source]#
Write a provenance_info.json file with all assorted catalog creation metadata
- Parameters:
catalog_base_dir (str) – base directory for catalog, where file will be written
dataset_info (
BaseCatalogInfo
)tool_args (
dict
) – dictionary of additional arguments provided by the tool creating this catalog.storage_options – dictionary that contains abstract filesystem credentials