hats.io.file_io
===============

.. py:module:: hats.io.file_io


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/hats/io/file_io/file_io/index
   /autoapi/hats/io/file_io/file_pointer/index


Functions
---------

.. autoapisummary::

   hats.io.file_io.delete_file
   hats.io.file_io.load_csv_to_pandas
   hats.io.file_io.load_csv_to_pandas_generator
   hats.io.file_io.load_text_file
   hats.io.file_io.make_directory
   hats.io.file_io.read_fits_image
   hats.io.file_io.read_parquet_dataset
   hats.io.file_io.read_parquet_file
   hats.io.file_io.read_parquet_file_to_pandas
   hats.io.file_io.read_parquet_metadata
   hats.io.file_io.remove_directory
   hats.io.file_io.write_dataframe_to_csv
   hats.io.file_io.write_dataframe_to_parquet
   hats.io.file_io.write_fits_image
   hats.io.file_io.write_parquet_metadata
   hats.io.file_io.write_string_to_file
   hats.io.file_io.directory_has_contents
   hats.io.file_io.get_upath
   hats.io.file_io.get_upath_for_protocol


Package Contents
----------------

.. py:function:: delete_file(file_handle: str | pathlib.Path | upath.UPath)

   
   Deletes file from filesystem.


   :Parameters:

       **file_handle: str | Path | UPath**
           location of file pointer


   ..
       !! processed by numpydoc !!

.. py:function:: load_csv_to_pandas(file_pointer: str | pathlib.Path | upath.UPath, **kwargs) -> pandas.DataFrame

   
   Load a csv file to a pandas dataframe


   :Parameters:

       **file_pointer: str | Path | UPath**
           location of csv file to load

       **\*\*kwargs**
           arguments to pass to pandas ``read_csv`` loading method


   :Returns:

       pd.DataFrame
           contents of the CVS file, as a dataframe.


   ..
       !! processed by numpydoc !!

.. py:function:: load_csv_to_pandas_generator(file_pointer: str | pathlib.Path | upath.UPath, *, chunksize=10000, compression=None, **kwargs) -> collections.abc.Generator[pandas.DataFrame]

   
   Load a csv file to a pandas dataframe


   :Parameters:

       **file_pointer: str | Path | UPath**
           location of csv file to load

       **chunksize** : int
           (Default value = 10_000) number of rows to load per chunk

       **compression** : str
           (Default value = None) for compressed CSVs, the manner of compression. e.g. 'gz', 'bzip'.

       **\*\*kwargs**
           arguments to pass to pandas ``read_csv`` loading method


   :Yields:

       pd.DataFrame
           chunked contents of the CVS file, as a dataframe.


   ..
       !! processed by numpydoc !!

.. py:function:: load_text_file(file_pointer: str | pathlib.Path | upath.UPath, encoding: str = 'utf-8')

   
   Load a text file content to a list of strings.


   :Parameters:

       **file_pointer: str | Path | UPath**
           location of file to read

       **encoding: str**
           (Default value = "utf-8") string encoding method used by the file


   :Returns:

       str
           full string contents of the file as a list of strings, one per line.


   ..
       !! processed by numpydoc !!

.. py:function:: make_directory(file_pointer: str | pathlib.Path | upath.UPath, exist_ok: bool = False)

   
   Make a directory at a given file pointer

   Will raise an error if a directory already exists, unless `exist_ok` is True in which case
   any existing directories will be left unmodified.

   :Parameters:

       **file_pointer: str | Path | UPath**
           location in file system to make directory

       **exist_ok: bool**
           (Default value = False)
           If false will raise error if directory exists. If true existing
           directories will be ignored and not modified


   ..
       !! processed by numpydoc !!

.. py:function:: read_fits_image(map_file_pointer: str | pathlib.Path | upath.UPath) -> numpy.ndarray

   
   Read the object spatial distribution information from a healpix FITS file.


   :Parameters:

       **map_file_pointer: str | Path | UPath**
           location of file to be read


   :Returns:

       np.ndarray
           one-dimensional numpy array of integers where the
           value at each index corresponds to the number of objects found at the healpix pixel.


   ..
       !! processed by numpydoc !!

.. py:function:: read_parquet_dataset(source: str | pathlib.Path | upath.UPath | list[str | pathlib.Path | upath.UPath], **kwargs) -> tuple[str | list[str], pyarrow.dataset.Dataset]

   
   Read parquet dataset from directory pointer or list of files.

   Note that pyarrow.dataset reads require that directory pointers don't contain a
   leading slash, and the protocol prefix may additionally be removed. As such, we also return
   the directory path that is formatted for pyarrow ingestion for follow-up.

   See more info on source specification and possible kwargs at
   https://arrow.apache.org/docs/python/generated/pyarrow.dataset.dataset.html

   :Parameters:

       **source: str | Path | UPath | list[str | Path | UPath]**
           directory, path, or list of paths to read data from

       **\*\*kwargs**
           additional arguments passed to ``pyarrow.dataset.dataset``


   :Returns:

       tuple[str | list[str], Dataset]
           Tuple containing a path to the dataset (that is formatted for pyarrow ingestion)
           and the dataset read from disk.


   ..
       !! processed by numpydoc !!

.. py:function:: read_parquet_file(file_pointer: str | pathlib.Path | upath.UPath, **kwargs) -> pyarrow.parquet.ParquetFile

   
   Read single parquet file.


   :Parameters:

       **file_pointer: str | Path | UPath**
           location of parquet file

       **\*\*kwargs**
           additional arguments to be passed to pyarrow.parquet.ParquetFile


   :Returns:

       pq.ParquetFile
           full contents of parquet file


   ..
       !! processed by numpydoc !!

.. py:function:: read_parquet_file_to_pandas(file_pointer: str | pathlib.Path | upath.UPath, is_dir: bool | None = None, **kwargs) -> nested_pandas.NestedFrame

   
   Reads parquet file(s) to a pandas DataFrame


   :Parameters:

       **file_pointer: str | Path | UPath**
           File Pointer to a parquet file or a directory containing parquet files

       **is_dir** : bool | None
           If True, the pointer represents a pixel directory, otherwise, the pointer
           represents a file. In both cases there is no need to check the pointer's
           content type. If `is_dir` is None (default), this method will resort to
           `upath.is_dir()` to identify the type of pointer. Inferring the type for
           HTTP is particularly expensive because it requires downloading the contents
           of the pointer in its entirety.

       **\*\*kwargs**
           Additional arguments to pass to pandas read_parquet method


   :Returns:

       NestedFrame
           Pandas DataFrame with the data from the parquet file(s)


   ..
       !! processed by numpydoc !!

.. py:function:: read_parquet_metadata(file_pointer: str | pathlib.Path | upath.UPath, **kwargs) -> pyarrow.parquet.FileMetaData

   
   Read FileMetaData from footer of a single Parquet file.


   :Parameters:

       **file_pointer: str | Path | UPath**
           location of file to read metadata from

       **\*\*kwargs**
           additional arguments to be passed to pyarrow.parquet.read_metadata


   :Returns:

       pq.FileMetaData
           parqeut file metadata (includes schema)


   ..
       !! processed by numpydoc !!

.. py:function:: remove_directory(file_pointer: str | pathlib.Path | upath.UPath, ignore_errors=False)

   
   Remove a directory, and all contents, recursively.


   :Parameters:

       **file_pointer: str | Path | UPath**
           directory in file system to remove

       **ignore_errors** : bool
           (Default value = False)
           if True errors resulting from failed removals will be ignored


   ..
       !! processed by numpydoc !!

.. py:function:: write_dataframe_to_csv(dataframe: pandas.DataFrame, file_pointer: str | pathlib.Path | upath.UPath, **kwargs)

   
   Write a pandas DataFrame to a CSV file


   :Parameters:

       **dataframe: pd.DataFrame**
           DataFrame to write

       **file_pointer: str | Path | UPath**
           location of file to write to

       **\*\*kwargs**
           args to pass to pandas ``to_csv`` method


   ..
       !! processed by numpydoc !!

.. py:function:: write_dataframe_to_parquet(dataframe: pandas.DataFrame, file_pointer)

   
   Write a pandas DataFrame to a parquet file


   :Parameters:

       **dataframe: pd.DataFrame**
           DataFrame to write

       **file_pointer** : str | Path | UPath
           location of file to write to


   ..
       !! processed by numpydoc !!

.. py:function:: write_fits_image(histogram: numpy.ndarray, map_file_pointer: str | pathlib.Path | upath.UPath)

   
   Write the object spatial distribution information to a healpix FITS file.


   :Parameters:

       **histogram: np.ndarray**
           one-dimensional numpy array of long integers where the
           value at each index corresponds to the number of objects found at the healpix pixel.

       **map_file_pointer: str | Path | UPath**
           location of file to be written


   ..
       !! processed by numpydoc !!

.. py:function:: write_parquet_metadata(schema, file_pointer: str | pathlib.Path | upath.UPath, metadata_collector: list | None = None, **kwargs)

   
   Write a metadata only parquet file from a schema


   :Parameters:

       **schema** : pa.Schema
           pyarrow schema to be written

       **file_pointer: str | Path | UPath**
           location of file to be written to

       **metadata_collector: list | None**
           (Default value = None) where to collect metadata information

       **\*\*kwargs**
           additional arguments to be passed to pyarrow.parquet.write_metadata


   ..
       !! processed by numpydoc !!

.. py:function:: write_string_to_file(file_pointer: str | pathlib.Path | upath.UPath, string: str, encoding: str = 'utf-8')

   
   Write a string to a text file


   :Parameters:

       **file_pointer: str | Path | UPath**
           file location to write file to

       **string: str**
           string to write to file

       **encoding: str**
           (Default value = "utf-8") encoding method to write to file with


   ..
       !! processed by numpydoc !!

.. py:function:: directory_has_contents(pointer: str | pathlib.Path | upath.UPath) -> bool

   
   Checks if a directory already has some contents (any files or subdirectories)


   :Parameters:

       **pointer** : str | Path | UPath
           File Pointer to check for existing contents


   :Returns:

       bool
           True if there are any files or subdirectories below this directory.


   ..
       !! processed by numpydoc !!

.. py:function:: get_upath(path: str | pathlib.Path | upath.UPath) -> upath.UPath

   
   Returns a UPath file pointer from a path string or other path-like type.


   :Parameters:

       **path: str | Path | UPath**
           base file path to be normalized to UPath


   :Returns:

       UPath
           Instance of UPath.


   ..
       !! processed by numpydoc !!

.. py:function:: get_upath_for_protocol(path: str | pathlib.Path) -> upath.UPath

   
   Create UPath with protocol-specific configurations.

   If we access pointers on S3 and credentials are not found we assume
   an anonymous access, i.e., that the bucket is public.

   :Parameters:

       **path: str | Path | UPath**
           base file path to be normalized to UPath


   :Returns:

       UPath
           Instance of UPath.


   ..
       !! processed by numpydoc !!