hipscat.catalog.partition_info
#
Container class to hold per-partition metadata
Module Contents#
Classes#
Container class for per-partition info. |
- class PartitionInfo(pixel_list: List[hipscat.pixel_math.HealpixPixel], catalog_base_dir: str = None)[source]#
Container class for per-partition info.
- get_healpix_pixels() List[hipscat.pixel_math.HealpixPixel] [source]#
Get healpix pixel objects for all pixels represented as partitions.
- Returns:
List of HealpixPixel
- get_highest_order() int [source]#
Get the highest healpix order for the dataset.
- Returns:
int representing highest order.
- write_to_file(partition_info_file: hipscat.io.FilePointer = None, catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#
Write all partition data to CSV file.
If no paths are provided, the catalog base directory from the read_from_dir call is used.
- Parameters:
partition_info_file – FilePointer to where the partition_info.csv file will be written.
catalog_path – base directory for a catalog where the partition_info.csv file will be written.
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Raises:
ValueError – if no path is provided, and could not be inferred.
- write_to_metadata_files(catalog_path: hipscat.io.FilePointer = None, storage_options: dict = None)[source]#
Generate parquet metadata, using the known partitions.
If no catalog_path is provided, the catalog base directory from the read_from_dir call is used.
- Parameters:
catalog_path (FilePointer) – base path for the catalog
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Raises:
ValueError – if no path is provided, and could not be inferred.
- classmethod read_from_dir(catalog_base_dir: hipscat.io.FilePointer, storage_options: dict = None) PartitionInfo [source]#
Read partition info from a file within a hipscat directory.
This will look for a partition_info.csv file, and if not found, will look for a _metadata file. The second approach is typically slower for large catalogs therefore a warning is issued to the user. In internal testing with large catalogs, the first approach takes less than a second, while the second can take 10-20 seconds.
- Parameters:
catalog_base_dir – path to the root directory of the catalog
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Returns:
A PartitionInfo object with the data from the file
- Raises:
FileNotFoundError – if neither desired file is found in the catalog_base_dir
- classmethod read_from_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) PartitionInfo [source]#
Read partition info from a _metadata file to create an object
- Parameters:
metadata_file (FilePointer) – FilePointer to the _metadata file
storage_options (dict) – dictionary that contains abstract filesystem credentials
strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.
- Returns:
A PartitionInfo object with the data from the file
- classmethod _read_from_metadata_file(metadata_file: hipscat.io.FilePointer, strict: bool = False, storage_options: dict = None) List[hipscat.pixel_math.HealpixPixel] [source]#
Read partition info list from a _metadata file.
- Parameters:
metadata_file (FilePointer) – FilePointer to the _metadata file
storage_options (dict) – dictionary that contains abstract filesystem credentials
strict (bool) – use strict parsing of _metadata file. this is slower, but gives more helpful error messages in the case of invalid data.
- Returns:
A PartitionInfo object with the data from the file
- classmethod read_from_csv(partition_info_file: hipscat.io.FilePointer, storage_options: dict = None) PartitionInfo [source]#
Read partition info from a partition_info.csv file to create an object
- Parameters:
partition_info_file (FilePointer) – FilePointer to the partition_info.csv file
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Returns:
A PartitionInfo object with the data from the file
- classmethod _read_from_csv(partition_info_file: hipscat.io.FilePointer, storage_options: dict = None) PartitionInfo [source]#
Read partition info from a partition_info.csv file to create an object
- Parameters:
partition_info_file (FilePointer) – FilePointer to the partition_info.csv file
storage_options (dict) – dictionary that contains abstract filesystem credentials
- Returns:
A PartitionInfo object with the data from the file
- as_dataframe()[source]#
Construct a pandas dataframe for the partition info pixels.
- Returns:
Dataframe with order, directory, and pixel info.
- classmethod from_healpix(healpix_pixels: List[hipscat.pixel_math.HealpixPixel]) PartitionInfo [source]#
Create a partition info object from a list of constituent healpix pixels.
- Parameters:
healpix_pixels – list of healpix pixels
- Returns:
A PartitionInfo object with the same healpix pixels