hats.catalog.dataset
====================

.. py:module:: hats.catalog.dataset


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/hats/catalog/dataset/collection_properties/index
   /autoapi/hats/catalog/dataset/dataset/index
   /autoapi/hats/catalog/dataset/table_properties/index


Classes
-------

.. autoapisummary::

   hats.catalog.dataset.Dataset


Package Contents
----------------

.. py:class:: Dataset(catalog_info: hats.catalog.dataset.table_properties.TableProperties, catalog_path: str | pathlib.Path | upath.UPath | None = None, schema: pyarrow.Schema | None = None, snapshot: hats.catalog.catalog_snapshot.CatalogSnapshot | None = None, generate_snapshot: bool = False)

   
   A base HATS dataset that contains a properties file and the data contained in parquet files


   ..
       !! processed by numpydoc !!

   .. py:attribute:: catalog_info


   .. py:attribute:: catalog_name


   .. py:attribute:: catalog_path
      :value: None


   .. py:attribute:: catalog_base_dir
      :value: None


   .. py:attribute:: schema
      :value: None


   .. py:attribute:: snapshot
      :value: None


   .. py:property:: original_schema
      :type: pyarrow.Schema | None


      The original on-disk schema, before any column selection.


      ..
          !! processed by numpydoc !!


   .. py:property:: on_disk
      :type: bool


      Is the catalog stored on disk?


      ..
          !! processed by numpydoc !!


   .. py:property:: unmodified
      :type: bool


      Has the catalog been modified from its original on disk state?


      ..
          !! processed by numpydoc !!


   .. py:method:: aggregate_column_statistics(exclude_hats_columns: bool = True, exclude_columns: list[str] = None, include_columns: list[str] = None)

      
      Read footer statistics in parquet metadata, and report on global min/max values.


      :Parameters:

          **exclude_hats_columns** : bool
              exclude HATS spatial and partitioning fields
              from the statistics. Defaults to True.

          **exclude_columns** : list[str]
              additional columns to exclude from the statistics.

          **include_columns** : list[str]
              if specified, only return statistics for the column
              names provided. Defaults to None, and returns all non-hats columns.


      :Returns:

          Dataframe
              aggregated statistics.


      ..
          !! processed by numpydoc !!


   .. py:method:: per_pixel_statistics(*, exclude_hats_columns: bool = True, exclude_columns: list[str] | None = None, include_columns: list[str] | None = None, only_numeric_columns: bool = False, include_stats: list[str] | None = None, multi_index=False, per_row_group: bool = False)

      
      Read footer statistics in parquet metadata, and report on statistics about
      each pixel partition.


      ..
          !! processed by numpydoc !!


   .. py:method:: per_partition_statistics(*, exclude_hats_columns: bool = True, exclude_columns: list[str] = None, include_columns: list[str] = None, only_numeric_columns: bool = False, include_stats: list[str] = None, multi_index=False, per_row_group: bool = False)

      
      Read footer statistics in parquet metadata, and report on statistics about
      each pixel partition.


      :Parameters:

          **exclude_hats_columns** : bool
              exclude HATS spatial and partitioning fields from the statistics. Defaults to True.

          **exclude_columns** : list[str]
              additional columns to exclude from the statistics.

          **include_columns** : list[str]
              if specified, only return statistics for the column
              names provided. Defaults to None, and returns all non-hats columns.

          **include_stats** : list[str]
              if specified, only return the kinds of values from list
              (min_value, max_value, null_count, row_count). Defaults to None, and returns all values.

          **multi_index** : bool
              should the returned frame be created with a multi-index, first on
              pixel, then on column name? Default is False, and instead indexes on pixel, with
              separate columns per-data-column and stat value combination.
              (Default value = False)


      :Returns:

          Dataframe
              all statistics.


      ..
          !! processed by numpydoc !!