The target-parquet Meltano loader sends data into Parquet after it was pulled from a source using an extractor.

Alternative variants #

Multiple variants of target-parquet are available. This document describes the default estrategiahq variant, which is recommended for new users.

Alternative variants are:

Getting Started #

Prerequisites #

If you haven't already, follow the initial steps of the Getting Started guide:

  1. Install Meltano
  2. Create your Meltano project
  3. Add an extractor to pull data from a source

Installation and configuration #

Using the Command Line Interface #

  1. Add the target-parquet loader to your project using meltano add :

    meltano add loader target-parquet
  2. Configure the settings below using meltano config .

Next steps #

Follow the remaining steps of the Getting Started guide:

  1. Run a data integration (EL) pipeline

If you run into any issues, learn how to get help.

Capabilities #

target-parquet does not have any capabilities defined in its metadata. Please consider adding them by making a pull request to the YAML file that defines the capabilities for this loader.

Settings #

These and other supported settings are documented below. To quickly find the setting you're looking for, use the Table of Contents at the top of the page.

Disable Collection (disable_collection) #

A boolean of whether to disable Singer anonymous tracking.

How to use #

Manage this setting using meltano config or an environment variable:

meltano config target-parquet set disable_collection true

export TARGET_PARQUET_DISABLE_COLLECTION=true

Logging Level (logging_level) #

(Default - INFO) The log level. Can also be set using environment variables.

How to use #

Manage this setting using meltano config or an environment variable:

meltano config target-parquet set logging_level <logging_level>

export TARGET_PARQUET_LOGGING_LEVEL=<logging_level>

Destination Path (destination_path) #

(Default - ‘.’) The path to write files out to.

How to use #

Manage this setting using meltano config or an environment variable:

meltano config target-parquet set destination_path <destination_path>

export TARGET_PARQUET_DESTINATION_PATH=<destination_path>

Compression Method (compression_method) #

Compression methods have to be supported by Pyarrow, and currently the compression modes available are - snappy (recommended), zstd, brotli and gzip.

How to use #

Manage this setting using meltano config or an environment variable:

meltano config target-parquet set compression_method <compression_method>

export TARGET_PARQUET_COMPRESSION_METHOD=<compression_method>

Streams In Separate Folder (streams_in_separate_folder) #

(Default - False) The option to create each stream in a different folder, as these are expected to come in different schema.

How to use #

Manage this setting using meltano config or an environment variable:

meltano config target-parquet set streams_in_separate_folder true

export TARGET_PARQUET_STREAMS_IN_SEPARATE_FOLDER=true

File Size (file_size) #

The number of rows to write per file. The default is to write to a single file.

How to use #

Manage this setting using meltano config or an environment variable:

meltano config target-parquet set file_size 1234

export TARGET_PARQUET_FILE_SIZE=1234

Looking for help? #

If you're having trouble getting the target-parquet loader to work, look for an existing issue in its repository, file a new issue, or join the Meltano Slack community and ask for help in the #plugins-general channel.

Found an issue on this page? #

This page is generated from a YAML file that you can contribute changes to. Edit it on GitHub!