hats.io
=======

.. py:module:: hats.io

.. autoapi-nested-parse::

   
   Utilities for reading and writing catalog files
















   ..
       !! processed by numpydoc !!


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/hats/io/file_io/index
   /autoapi/hats/io/parquet_metadata/index
   /autoapi/hats/io/paths/index
   /autoapi/hats/io/show_versions/index
   /autoapi/hats/io/size_estimates/index
   /autoapi/hats/io/skymap/index
   /autoapi/hats/io/summary_file/index
   /autoapi/hats/io/templates/index
   /autoapi/hats/io/validation/index


Functions
---------

.. autoapisummary::

   hats.io.write_parquet_metadata
   hats.io.get_common_metadata_pointer
   hats.io.get_parquet_metadata_pointer
   hats.io.get_partition_info_pointer
   hats.io.get_point_map_file_pointer
   hats.io.get_skymap_file_pointer
   hats.io.pixel_catalog_file
   hats.io.pixel_directory
   hats.io.skymap_coverage


Package Contents
----------------

.. py:function:: write_parquet_metadata(catalog_path: str | pathlib.Path | upath.UPath, *, order_by_healpix=True, output_path: str | pathlib.Path | upath.UPath | None = None, create_thumbnail: bool = False, thumbnail_threshold: int = 1000000, create_metadata: bool = True, create_per_partition_stats: bool = False)

   
   Write Parquet dataset-level metadata files (and optional thumbnail) for a catalog.

   Creates files::

       catalog/
       ├── data_thumbnail.parquet           (only if create_thumbnail=True)
       ├── per_partition_statistics.parquet (only if create_per_partition_stats=True)
       ├── ...
       └── dataset/
           ├── _common_metadata             (always written)
           ├── _metadata                    (only if create_metadata=True)
           └──  ...

   ``dataset/_common_metadata`` contains the full schema of the dataset. This file
   will know all of the columns and their types, as well as any file-level key-value
   metadata associated with the full Parquet dataset.

   ``dataset/_metadata`` contains the combined row group footers from all Parquet files
   in the dataset, which allows readers to read the entire dataset without having
   to open each individual Parquet file. This file can be large for datasets with
   many files, so users may choose to omit it by setting ``create_metadata=False``.

   ``data_thumbnail.parquet`` gives the user a quick overview of the whole dataset.
   It is a compact file containing one row from each data partition, up to a maximum
   of ``thumbnail_threshold`` rows.

   ``per_partition_statistics.parquet`` contains summary statistics from all columns
   in data partition files, e.g. column min/max values, count of null values, etc.

   :Parameters:

       **catalog_path** : str | Path | UPath
           Base path for the catalog root.

       **order_by_healpix** : bool, default=True
           If True, reorder combined metadata by breadth-first Healpix pixel ordering
           (e.g., secondary indexes). Set False for datasets that should not be reordered.
           Does not modify dataset files on disk.

       **output_path** : str | Path | UPath | None, default=None
           Base path to write metadata files. If None, uses ``catalog_path``.

       **create_thumbnail** : bool, default=False
           If True, writes a compact ``data_thumbnail.parquet`` containing one row per
           sampled file.

       **thumbnail_threshold** : int, default=1_000_000
           Maximum number of rows in the thumbnail (or maximum number of files, if
           thumbnail_threshold exceeds the number of files). One row per partition.

       **create_metadata** : bool, default=True
           If True, writes ``dataset/_metadata`` combining row group footers.

       **create_per_partition_stats** : bool, default=False
           If True, writes ``per_partition_statistics.parquet`` containing summary
           statistics from all columns in data partition files.



   :Returns:

       int
           Total number of rows across all parquet files in the dataset.








   .. rubric:: Notes

   For more information on the general Parquet metadata files, and why we write them, see
   https://arrow.apache.org/docs/python/parquet.html#writing-metadata-and-common-metadata-files

   For more information on HATS-specific metadata files and conventions, see
   https://www.ivoa.net/documents/Notes/HATS/



   ..
       !! processed by numpydoc !!

.. py:function:: get_common_metadata_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath) -> upath.UPath

   
   Get file pointer to `_common_metadata` parquet metadata file


   :Parameters:

       **catalog_base_dir: str | Path | UPath**
           base directory of the catalog (includes catalog name)



   :Returns:

       UPath
           File Pointer to the catalog's `_common_metadata` file











   ..
       !! processed by numpydoc !!

.. py:function:: get_parquet_metadata_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath) -> upath.UPath

   
   Get file pointer to `_metadata` parquet metadata file


   :Parameters:

       **catalog_base_dir: str | Path | UPath**
           base directory of the catalog (includes catalog name)



   :Returns:

       UPath
           File Pointer to the catalog's `_metadata` file











   ..
       !! processed by numpydoc !!

.. py:function:: get_partition_info_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath) -> upath.UPath

   
   Get file pointer to ``partition_info.csv`` metadata file


   :Parameters:

       **catalog_base_dir: str | Path | UPath**
           base directory of the catalog (includes catalog name)



   :Returns:

       UPath
           File Pointer to the catalog's ``partition_info.csv`` file











   ..
       !! processed by numpydoc !!

.. py:function:: get_point_map_file_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath) -> upath.UPath

   
   Get file pointer to `point_map.fits` FITS image file.


   :Parameters:

       **catalog_base_dir: str | Path | UPath**
           base directory of the catalog (includes catalog name)



   :Returns:

       UPath
           File Pointer to the catalog's `point_map.fits` FITS image file.











   ..
       !! processed by numpydoc !!

.. py:function:: get_skymap_file_pointer(catalog_base_dir: str | pathlib.Path | upath.UPath, order: int | None = None) -> upath.UPath

   
   Get file pointer to `skymap.fits` or `skymap.K.fits` FITS image file.


   :Parameters:

       **catalog_base_dir: str | Path | UPath**
           base directory of the catalog (includes catalog name)

       **order: int | None**
           (Default value = None) desired order for the map, if looking for a down-sampled map.



   :Returns:

       UPath
           File Pointer to the FITS image file.











   ..
       !! processed by numpydoc !!

.. py:function:: pixel_catalog_file(catalog_base_dir: str | pathlib.Path | upath.UPath | None, pixel: hats.pixel_math.healpix_pixel.HealpixPixel, query_params: dict | None = None, npix_suffix: str = '.parquet') -> upath.UPath

   
   Create path *pointer* for a pixel catalog file. This will not create the directory
   or file.

   The catalog file name will take the HiPS standard form of::

       <catalog_base_dir>/Norder=<pixel_order>/Dir=<directory number>/Npix=<pixel_number>.parquet

   Where the directory number is calculated using integer division as::

       (pixel_number/10000)*10000

   :Parameters:

       **catalog_base_dir** : str | Path | UPath | None
           base directory of the catalog (includes catalog name)

       **pixel** : HealpixPixel
           the healpix pixel to create path to

       **query_params: dict | None**
           (Default value = None) Params to append to URL. Ex::
           
               {'cols': ['ra', 'dec'], 'fltrs': ['r>=10', 'g<18']}

       **npix_suffix: str**
           (Default value = ".parquet") extension for the parquet file (or `/` if a directory)



   :Returns:

       UPath
           catalog file name











   ..
       !! processed by numpydoc !!

.. py:function:: pixel_directory(catalog_base_dir: str | pathlib.Path | upath.UPath | None, pixel_order: int, pixel_number: int | None = None, directory_number: int | None = None) -> upath.UPath

   
   Create path pointer for a pixel directory. This will not create the directory.

   One of pixel_number or directory_number is required. The directory name will
   take the HiPS standard form of::

       <catalog_base_dir>/dataset/Norder=<pixel_order>/Dir=<directory number>

   Where the directory number is calculated using integer division as::

       (pixel_number/10000)*10000

   :Parameters:

       **catalog_base_dir** : str | Path | UPath | None
           base directory of the catalog (includes catalog name)

       **pixel_order** : int
           the healpix order of the pixel

       **pixel_number** : int | None
           the number of the healpix pixel at ``pixel_order``

       **directory_number** : int | None
           directory number (or inferred from pixel number)



   :Returns:

       UPath
           directory name











   ..
       !! processed by numpydoc !!

.. py:function:: skymap_coverage(catalog, order=None)

   
   Compute the fractional sky coverage of a catalog from its healpix skymap file.


   :Parameters:

       **catalog** : Catalog
           Catalog object corresponding to an on-disk catalog with skymap files.

       **order** : int, optional
           healpix order to read the skymap at. If None, the order of the default
           skymap will be used.



   :Returns:

       float
           fractional sky coverage between 0.0 and 1.0.











   ..
       !! processed by numpydoc !!

