hipscat.io.file_io.file_io

`hipscat.io.file_io.file_io`#

Module Contents#

Functions#

`make_directory`(file_pointer[, exist_ok, storage_options])	Make a directory at a given file pointer
`remove_directory`(file_pointer[, ignore_errors, ...])	Remove a directory, and all contents, recursively.
`write_string_to_file`(file_pointer, string[, encoding, ...])	Write a string to a text file
`load_text_file`(file_pointer[, encoding, storage_options])	Load a text file content to a list of strings.
`load_json_file`(→ dict)	Load a json file to a dictionary
`load_csv_to_pandas`(→ pandas.DataFrame)	Load a csv file to a pandas dataframe
`load_parquet_to_pandas`(→ pandas.DataFrame)	Load a parquet file to a pandas dataframe
`write_dataframe_to_csv`(dataframe, file_pointer[, ...])	Write a pandas DataFrame to a CSV file
`write_dataframe_to_parquet`(dataframe, file_pointer[, ...])	Write a pandas DataFrame to a parquet file
`read_parquet_metadata`(→ pyarrow.parquet.FileMetaData)	Read FileMetaData from footer of a single Parquet file.
`read_parquet_dataset`(dir_pointer, storage_options, ...)	Read parquet dataset from directory pointer.
`read_parquet_file`(file_pointer[, storage_options])	Read parquet file from file pointer.
`write_parquet_metadata`(schema, file_pointer[, ...])	Write a metadata only parquet file from a schema
`read_fits_image`(map_file_pointer[, storage_options])	Read the object spatial distribution information from a healpix FITS file.
`write_fits_image`(histogram, map_file_pointer[, ...])	Write the object spatial distribution information to a healpix FITS file.
`read_yaml`(file_handle[, storage_options])	Reads yaml file from filesystem.
`delete_file`(file_handle[, storage_options])	Deletes file from filesystem.
`read_parquet_file_to_pandas`(→ pandas.DataFrame)	Reads a parquet file to a pandas DataFrame

make_directory(file_pointer: hipscat.io.file_io.file_pointer.FilePointer, exist_ok: bool = False, storage_options: Dict[Any, Any] | None = None)[source]#

Make a directory at a given file pointer

Will raise an error if a directory already exists, unless exist_ok is True in which case any existing directories will be left unmodified

Parameters:

file_pointer – location in file system to make directory
exist_ok – Default False. If false will raise error if directory exists. If true existing directories will be ignored and not modified
storage_options – dictionary that contains abstract filesystem credentials

Raises:

OSError –

remove_directory(file_pointer: hipscat.io.file_io.file_pointer.FilePointer, ignore_errors=False, storage_options: Dict[Any, Any] | None = None)[source]#

Remove a directory, and all contents, recursively.

Parameters:

file_pointer – directory in file system to remove
ignore_errors – if True errors resulting from failed removals will be ignored
storage_options – dictionary that contains abstract filesystem credentials

write_string_to_file(file_pointer: hipscat.io.file_io.file_pointer.FilePointer, string: str, encoding: str = 'utf-8', storage_options: Dict[Any, Any] | None = None)[source]#

Write a string to a text file

Parameters:

file_pointer – file location to write file to
string – string to write to file
encoding – Default: ‘utf-8’, encoding method to write to file with
storage_options – dictionary that contains abstract filesystem credentials

load_text_file(file_pointer: hipscat.io.file_io.file_pointer.FilePointer, encoding: str = 'utf-8', storage_options: Dict[Any, Any] | None = None)[source]#

Load a text file content to a list of strings.

Parameters:

file_pointer – location of file to read
encoding – string encoding method used by the file
storage_options – dictionary that contains abstract filesystem credentials

Returns:

text contents of file.

load_json_file(file_pointer: hipscat.io.file_io.file_pointer.FilePointer, encoding: str = 'utf-8', storage_options: Dict[Any, Any] | None = None) → dict[source]#

Load a json file to a dictionary

Parameters:

file_pointer – location of file to read
encoding – string encoding method used by the file
storage_options – dictionary that contains abstract filesystem credentials

Returns:

dictionary of key value pairs loaded from the JSON file

load_csv_to_pandas(file_pointer: hipscat.io.file_io.file_pointer.FilePointer, storage_options: Dict[Any, Any] | None = None, **kwargs) → pandas.DataFrame[source]#

Load a csv file to a pandas dataframe

Parameters:

file_pointer – location of csv file to load
storage_options – dictionary that contains abstract filesystem credentials
**kwargs – arguments to pass to pandas read_csv loading method

Returns:

pandas dataframe loaded from CSV

load_parquet_to_pandas(file_pointer: hipscat.io.file_io.file_pointer.FilePointer, storage_options: Dict[Any, Any] | None = None, **kwargs) → pandas.DataFrame[source]#

Load a parquet file to a pandas dataframe

Parameters:

file_pointer – location of parquet file to load
storage_options – dictionary that contains abstract filesystem credentials
**kwargs – arguments to pass to pandas read_parquet loading method

Returns:

pandas dataframe loaded from parquet

write_dataframe_to_csv(dataframe: pandas.DataFrame, file_pointer: hipscat.io.file_io.file_pointer.FilePointer, storage_options: Dict[Any, Any] | None = None, **kwargs)[source]#

Write a pandas DataFrame to a CSV file

Parameters:

dataframe – DataFrame to write
file_pointer – location of file to write to
storage_options – dictionary that contains abstract filesystem credentials
**kwargs – args to pass to pandas to_csv method

write_dataframe_to_parquet(dataframe: pandas.DataFrame, file_pointer, storage_options: Dict[Any, Any] | None = None)[source]#

Write a pandas DataFrame to a parquet file

Parameters:

dataframe – DataFrame to write
file_pointer – location of file to write to
storage_options – dictionary that contains abstract filesystem credentials

read_parquet_metadata(file_pointer: hipscat.io.file_io.file_pointer.FilePointer, storage_options: Dict[Any, Any] | None = None, **kwargs) → pyarrow.parquet.FileMetaData[source]#

Read FileMetaData from footer of a single Parquet file.

Parameters:

file_pointer – location of file to read metadata from
storage_options – dictionary that contains abstract filesystem credentials
**kwargs – additional arguments to be passed to pyarrow.parquet.read_metadata

read_parquet_dataset(dir_pointer: hipscat.io.file_io.file_pointer.FilePointer, storage_options: Dict[Any, Any] | None = None, **kwargs)[source]#

Read parquet dataset from directory pointer.

Note that pyarrow.dataset reads require that directory pointers don’t contain a leading slash, and the protocol prefix may additionally be removed. As such, we also return the directory path that is formatted for pyarrow ingestion for follow-up.

Parameters:

dir_pointer – location of file to read metadata from
storage_options – dictionary that contains abstract filesystem credentials

Returns:

Tuple containing a path to the dataset (that is formatted for pyarrow ingestion) and the dataset read from disk.

read_parquet_file(file_pointer: hipscat.io.file_io.file_pointer.FilePointer, storage_options: Dict[Any, Any] | None = None)[source]#

Read parquet file from file pointer.

Parameters:

file_pointer – location of file to read metadata from
storage_options – dictionary that contains abstract filesystem credentials

write_parquet_metadata(schema: Any, file_pointer: hipscat.io.file_io.file_pointer.FilePointer, metadata_collector: list | None = None, storage_options: Dict[Any, Any] | None = None, **kwargs)[source]#

Write a metadata only parquet file from a schema

Parameters:

schema – schema to be written
file_pointer – location of file to be written to
metadata_collector – where to collect metadata information
storage_options – dictionary that contains abstract filesystem credentials
**kwargs – additional arguments to be passed to pyarrow.parquet.write_metadata

read_fits_image(map_file_pointer: hipscat.io.file_io.file_pointer.FilePointer, storage_options: Dict[Any, Any] | None = None)[source]#

Read the object spatial distribution information from a healpix FITS file.

Parameters:

file_pointer – location of file to be written
storage_options – dictionary that contains abstract filesystem credentials

Returns:

one-dimensional numpy array of long integers where the value at each index corresponds to the number of objects found at the healpix pixel.

write_fits_image(histogram: numpy.ndarray, map_file_pointer: hipscat.io.file_io.file_pointer.FilePointer, storage_options: Dict[Any, Any] | None = None)[source]#

Write the object spatial distribution information to a healpix FITS file.

Parameters:

histogram (np.ndarray) – one-dimensional numpy array of long integers where the value at each index corresponds to the number of objects found at the healpix pixel.
file_pointer – location of file to be written
storage_options – dictionary that contains abstract filesystem credentials

read_yaml(file_handle: hipscat.io.file_io.file_pointer.FilePointer, storage_options: Dict[Any, Any] | None = None)[source]#

Reads yaml file from filesystem.

Parameters:

file_handle – location of yaml file
storage_options – dictionary that contains abstract filesystem credentials

delete_file(file_handle: hipscat.io.file_io.file_pointer.FilePointer, storage_options: Dict[Any, Any] | None = None)[source]#

Deletes file from filesystem.

Parameters:

file_handle – location of file pointer
storage_options – dictionary that contains filesystem credentials

read_parquet_file_to_pandas(file_pointer: hipscat.io.file_io.file_pointer.FilePointer, storage_options: Dict[Any, Any] | None = None, **kwargs) → pandas.DataFrame[source]#

Reads a parquet file to a pandas DataFrame

Parameters:

file_pointer (FilePointer) – File Pointer to a parquet file
**kwargs – Additional arguments to pass to pandas read_parquet method

Returns:

Pandas DataFrame with the data from the parquet file

hipscat.io.file_io.file_io

Contents

hipscat.io.file_io.file_io#

Module Contents#

Functions#

`hipscat.io.file_io.file_io`#