hipscat.io.file_io.file_pointer#

Module Contents#

Functions#

get_file_protocol(→ str)

Method to parse filepointer for the filesystem protocol.

get_fs(→ Tuple[fsspec.filesystem, FilePointer])

Create the abstract filesystem

get_file_pointer_for_fs(→ FilePointer)

Creates the filepathway from the file_pointer.

get_full_file_pointer(→ FilePointer)

Rebuilds the file_pointer with the protocol and account name if required

get_file_pointer_from_path(→ FilePointer)

Returns a file pointer from a path string

get_basename_from_filepointer(→ str)

Returns the base name of a regular file. May return empty string if the file is a directory.

strip_leading_slash_for_pyarrow(→ FilePointer)

Strips the leading slash for pyarrow read/write functions.

append_paths_to_pointer(→ FilePointer)

Append directories and/or a file name to a specified file pointer.

does_file_or_directory_exist(→ bool)

Checks if a file or directory exists for a given file pointer

is_regular_file(→ bool)

Checks if a regular file (NOT a directory) exists for a given file pointer.

find_files_matching_path(→ List[FilePointer])

Find files or directories matching the provided path parts.

directory_has_contents(→ bool)

Checks if a directory already has some contents (any files or subdirectories)

get_directory_contents(→ List[FilePointer])

Finds all files and directories in the specified directory.

Attributes#

FilePointer

Unified type for references to files.

FilePointer[source]#

Unified type for references to files.

get_file_protocol(pointer: FilePointer) str[source]#

Method to parse filepointer for the filesystem protocol. If it doesn’t follow the pattern of protocol://pathway/to/file, then it assumes that it is a localfilesystem.

Parameters:

pointer – filesystem pathway pointer

get_fs(file_pointer: FilePointer, storage_options: Dict[Any, Any] | None = None) Tuple[fsspec.filesystem, FilePointer][source]#

Create the abstract filesystem

Parameters:
  • file_pointer – filesystem pathway

  • storage_options – dictionary that contains abstract filesystem credentials

Raises:

ImportError – if environment cannot import necessary libraries for fsspec filesystems.

get_file_pointer_for_fs(protocol: str, file_pointer: FilePointer) FilePointer[source]#

Creates the filepathway from the file_pointer.

This will strip the protocol so that the file_pointer can be accessed from the filesystem:

  • abfs filesystems DO NOT require the account_name in the pathway

  • s3 filesystems DO require the account_name/container name in the pathway

Parameters:
  • protocol – str filesytem protocol, file, abfs, or s3

  • file_pointer – filesystem pathway

get_full_file_pointer(path: str, protocol_path: str) FilePointer[source]#

Rebuilds the file_pointer with the protocol and account name if required

get_file_pointer_from_path(path: str, include_protocol: str = None) FilePointer[source]#

Returns a file pointer from a path string

get_basename_from_filepointer(pointer: FilePointer) str[source]#

Returns the base name of a regular file. May return empty string if the file is a directory.

Parameters:

pointerFilePointer object to find a basename within

Returns:

string representation of the basename of a file.

strip_leading_slash_for_pyarrow(pointer: FilePointer, protocol: str) FilePointer[source]#

Strips the leading slash for pyarrow read/write functions. This is required for pyarrow’s underlying filesystem abstraction.

Parameters:

pointerFilePointer object

Returns:

New file pointer with leading slash removed.

append_paths_to_pointer(pointer: FilePointer, *paths: str) FilePointer[source]#

Append directories and/or a file name to a specified file pointer.

Parameters:
  • pointerFilePointer object to add path to

  • paths – any number of directory names optionally followed by a file name to append to the pointer

Returns:

New file pointer to path given by joining given pointer and path names

does_file_or_directory_exist(pointer: FilePointer, storage_options: Dict[Any, Any] | None = None) bool[source]#

Checks if a file or directory exists for a given file pointer

Parameters:
  • pointer – File Pointer to check if file or directory exists at

  • storage_options – dictionary that contains abstract filesystem credentials

Returns:

True if file or directory at pointer exists, False if not

is_regular_file(pointer: FilePointer, storage_options: Dict[Any, Any] | None = None) bool[source]#

Checks if a regular file (NOT a directory) exists for a given file pointer.

Parameters:
  • pointer – File Pointer to check if a regular file

  • storage_options – dictionary that contains abstract filesystem credentials

Returns:

True if regular file at pointer exists, False if not or is a directory

find_files_matching_path(pointer: FilePointer, *paths: str, include_protocol=False, storage_options: Dict[Any, Any] | None = None) List[FilePointer][source]#

Find files or directories matching the provided path parts.

Parameters:
  • pointer – base File Pointer in which to find contents

  • paths – any number of directory names optionally followed by a file name. directory or file names may be replaced with * as a matcher.

  • include_protocol – boolean on whether or not to include the filesystem protocol in the returned directory contents

  • storage_options – dictionary that contains abstract filesystem credentials

Returns:

New file pointers to files found matching the path

directory_has_contents(pointer: FilePointer, storage_options: Dict[Any, Any] | None = None) bool[source]#

Checks if a directory already has some contents (any files or subdirectories)

Parameters:
  • pointer – File Pointer to check for existing contents

  • storage_options – dictionary that contains abstract filesystem credentials

Returns:

True if there are any files or subdirectories below this directory.

get_directory_contents(pointer: FilePointer, include_protocol=False, storage_options: Dict[Any, Any] | None = None) List[FilePointer][source]#

Finds all files and directories in the specified directory.

NB: This is not recursive, and will return only the first level of directory contents.

Parameters:
  • pointer – File Pointer in which to find contents

  • include_protocol – boolean on whether or not to include the filesystem protocol in the returned directory contents

  • storage_options – dictionary that contains abstract filesystem credentials

Returns:

New file pointers to files or subdirectories below this directory.