Parquet
Table of Contents
The target-parquet Meltano loader sends data into Parquet after it was pulled from a source using an extractor.
Alternative variants #
Multiple
variants
of target-parquet are available.
This document describes the default estrategiahq variant,
which is recommended for new users.
Alternative variants are:
Getting Started #
Prerequisites #
If you haven't already, follow the initial steps of the Getting Started guide:
Installation and configuration #
Using the Command Line Interface #
-
Add the
target-parquetloader to your project usingmeltano add:meltano add loader target-parquet -
Configure the settings below using
meltano config.
Next steps #
Follow the remaining steps of the Getting Started guide:
If you run into any issues, learn how to get help.
Capabilities #
target-parquet does not have any capabilities defined in its metadata.
Please consider adding them by making a pull request to the
YAML file
that defines the capabilities for this loader.
Settings #
These and other supported settings are documented below. To quickly find the setting you're looking for, use the Table of Contents at the top of the page.
Disable Collection (disable_collection)
#
-
Environment variable:
TARGET_PARQUET_DISABLE_COLLECTION
A boolean of whether to disable Singer anonymous tracking.
How to use #
Manage this setting using
meltano config or an
environment variable:
meltano config target-parquet set disable_collection true
export TARGET_PARQUET_DISABLE_COLLECTION=true
Logging Level (logging_level)
#
-
Environment variable:
TARGET_PARQUET_LOGGING_LEVEL
(Default - INFO) The log level. Can also be set using environment variables.
How to use #
Manage this setting using
meltano config or an
environment variable:
meltano config target-parquet set logging_level <logging_level>
export TARGET_PARQUET_LOGGING_LEVEL=<logging_level>
Destination Path (destination_path)
#
-
Environment variable:
TARGET_PARQUET_DESTINATION_PATH
(Default - ‘.’) The path to write files out to.
How to use #
Manage this setting using
meltano config or an
environment variable:
meltano config target-parquet set destination_path <destination_path>
export TARGET_PARQUET_DESTINATION_PATH=<destination_path>
Compression Method (compression_method)
#
-
Environment variable:
TARGET_PARQUET_COMPRESSION_METHOD
Compression methods have to be supported by Pyarrow, and currently the compression modes available are - snappy (recommended), zstd, brotli and gzip.
How to use #
Manage this setting using
meltano config or an
environment variable:
meltano config target-parquet set compression_method <compression_method>
export TARGET_PARQUET_COMPRESSION_METHOD=<compression_method>
Streams In Separate Folder (streams_in_separate_folder)
#
-
Environment variable:
TARGET_PARQUET_STREAMS_IN_SEPARATE_FOLDER
(Default - False) The option to create each stream in a different folder, as these are expected to come in different schema.
How to use #
Manage this setting using
meltano config or an
environment variable:
meltano config target-parquet set streams_in_separate_folder true
export TARGET_PARQUET_STREAMS_IN_SEPARATE_FOLDER=true
File Size (file_size)
#
-
Environment variable:
TARGET_PARQUET_FILE_SIZE
The number of rows to write per file. The default is to write to a single file.
How to use #
Manage this setting using
meltano config or an
environment variable:
meltano config target-parquet set file_size 1234
export TARGET_PARQUET_FILE_SIZE=1234Looking for help? #
If you're having trouble getting the
target-parquet loader to work, look for an
existing issue in its repository, file a new issue,
or
join the Meltano Slack community
and ask for help in the #plugins-general channel.
Found an issue on this page? #
This page is generated from a YAML file that you can contribute changes to. Edit it on GitHub!