hipscat.io.file_io.file_pointer
#
Module Contents#
Functions#
|
Method to parse filepointer for the filesystem protocol. |
|
Create the abstract filesystem |
|
Creates the filepathway from the file_pointer. |
|
Rebuilds the file_pointer with the protocol and account name if required |
|
Returns a file pointer from a path string |
Returns the base name of a regular file. May return empty string if the file is a directory. |
|
|
Strips the leading slash for pyarrow read/write functions. |
|
Append directories and/or a file name to a specified file pointer. |
|
Checks if a file or directory exists for a given file pointer |
|
Checks if a regular file (NOT a directory) exists for a given file pointer. |
|
Find files or directories matching the provided path parts. |
|
Checks if a directory already has some contents (any files or subdirectories) |
|
Finds all files and directories in the specified directory. |
Attributes#
Unified type for references to files. |
- get_file_protocol(pointer: FilePointer) str [source]#
Method to parse filepointer for the filesystem protocol. If it doesn’t follow the pattern of protocol://pathway/to/file, then it assumes that it is a localfilesystem.
- Parameters:
pointer – filesystem pathway pointer
- get_fs(file_pointer: FilePointer, storage_options: Dict[Any, Any] | None = None) Tuple[fsspec.filesystem, FilePointer] [source]#
Create the abstract filesystem
- Parameters:
file_pointer – filesystem pathway
storage_options – dictionary that contains abstract filesystem credentials
- Raises:
ImportError – if environment cannot import necessary libraries for fsspec filesystems.
- get_file_pointer_for_fs(protocol: str, file_pointer: FilePointer) FilePointer [source]#
Creates the filepathway from the file_pointer.
This will strip the protocol so that the file_pointer can be accessed from the filesystem:
abfs filesystems DO NOT require the account_name in the pathway
s3 filesystems DO require the account_name/container name in the pathway
- Parameters:
protocol – str filesytem protocol, file, abfs, or s3
file_pointer – filesystem pathway
- get_full_file_pointer(path: str, protocol_path: str) FilePointer [source]#
Rebuilds the file_pointer with the protocol and account name if required
- get_file_pointer_from_path(path: str, include_protocol: str = None) FilePointer [source]#
Returns a file pointer from a path string
- get_basename_from_filepointer(pointer: FilePointer) str [source]#
Returns the base name of a regular file. May return empty string if the file is a directory.
- Parameters:
pointer – FilePointer object to find a basename within
- Returns:
string representation of the basename of a file.
- strip_leading_slash_for_pyarrow(pointer: FilePointer, protocol: str) FilePointer [source]#
Strips the leading slash for pyarrow read/write functions. This is required for pyarrow’s underlying filesystem abstraction.
- Parameters:
pointer – FilePointer object
- Returns:
New file pointer with leading slash removed.
- append_paths_to_pointer(pointer: FilePointer, *paths: str) FilePointer [source]#
Append directories and/or a file name to a specified file pointer.
- Parameters:
pointer – FilePointer object to add path to
paths – any number of directory names optionally followed by a file name to append to the pointer
- Returns:
New file pointer to path given by joining given pointer and path names
- does_file_or_directory_exist(pointer: FilePointer, storage_options: Dict[Any, Any] | None = None) bool [source]#
Checks if a file or directory exists for a given file pointer
- Parameters:
pointer – File Pointer to check if file or directory exists at
storage_options – dictionary that contains abstract filesystem credentials
- Returns:
True if file or directory at pointer exists, False if not
- is_regular_file(pointer: FilePointer, storage_options: Dict[Any, Any] | None = None) bool [source]#
Checks if a regular file (NOT a directory) exists for a given file pointer.
- Parameters:
pointer – File Pointer to check if a regular file
storage_options – dictionary that contains abstract filesystem credentials
- Returns:
True if regular file at pointer exists, False if not or is a directory
- find_files_matching_path(pointer: FilePointer, *paths: str, include_protocol=False, storage_options: Dict[Any, Any] | None = None) List[FilePointer] [source]#
Find files or directories matching the provided path parts.
- Parameters:
pointer – base File Pointer in which to find contents
paths – any number of directory names optionally followed by a file name. directory or file names may be replaced with * as a matcher.
include_protocol – boolean on whether or not to include the filesystem protocol in the returned directory contents
storage_options – dictionary that contains abstract filesystem credentials
- Returns:
New file pointers to files found matching the path
- directory_has_contents(pointer: FilePointer, storage_options: Dict[Any, Any] | None = None) bool [source]#
Checks if a directory already has some contents (any files or subdirectories)
- Parameters:
pointer – File Pointer to check for existing contents
storage_options – dictionary that contains abstract filesystem credentials
- Returns:
True if there are any files or subdirectories below this directory.
- get_directory_contents(pointer: FilePointer, include_protocol=False, storage_options: Dict[Any, Any] | None = None) List[FilePointer] [source]#
Finds all files and directories in the specified directory.
NB: This is not recursive, and will return only the first level of directory contents.
- Parameters:
pointer – File Pointer in which to find contents
include_protocol – boolean on whether or not to include the filesystem protocol in the returned directory contents
storage_options – dictionary that contains abstract filesystem credentials
- Returns:
New file pointers to files or subdirectories below this directory.