Dagster (Roadmap)

Dagster Integration guidelines

Concur's Data Mapping solution now supports seamless integration with Dagster, enabling enhanced tracking and visibility of data workflows. The integration covers several key aspects, ensuring comprehensive metadata and lineage information is captured.

Integration Features

  1. Dagster arrow-up-rightPipeline Metadata:

    • Concur's Data Mapping solution collects and stores detailed metadata about Dagster pipelines, providing insights into the structure and configuration of each pipeline.

  2. Job and Op Run Information:

    • Detailed run information for each job and operation (Op) within Dagster is captured, including execution statuses and runtime metrics. This data helps in monitoring performance and diagnosing issues.

  3. Lineage Information:

    • When available, lineage information is extracted and stored, offering a clear view of data flow and transformations within the Dagster pipelines. This ensures transparency and traceability of data movements and transformations.

Concur's Dagster Sensor

The integration leverages Concur's Dagster sensor to monitor pipeline executions and emit relevant metadata. Here's how it works:

  • State Change Detection:

    • Dagster sensors are designed to detect specific state changes within pipelines. Concu's defined Dagster sensor triggers actions based on these changes.

  • Metadata Emission:

    • After each Dagster pipeline run, the sensor emits metadata, capturing both successful and failed executions. This ensures that all relevant information is collected, regardless of the outcome of the pipeline run.

  • Comprehensive Coverage:

    • The sensor captures detailed information about pipeline runs, including execution metrics and statuses, helping users maintain a complete and accurate view of their data workflows.

For more detailed information about Dagster sensors and their capabilities, please refer to the Dagster Sensors Documentation.arrow-up-right

Prerequisitesarrow-up-right

  1. There are two ways to define Dagster definition before starting dagster UI. One using Definitionsarrow-up-right class (recommended) and second using Repositoriesarrow-up-right.

  2. Creation of new dagster project by default uses Definition class to define Dagster definition.

  3. The DataHub dagster plugin provided sensor internally uses below configs. You can set these configs using environment variables. If not set, the sensor will take the default value

Configuration options:

Configuration Option
Default value
Description

datahub_client_config

The DataHub client config

dagster_url

The url to your Dagster Webserver.

capture_asset_materialization

True

Whether to capture asset keys as Dataset on AssetMaterialization event

capture_input_output

True

Whether to capture and try to parse input and output from HANDLED_OUTPUT,.LOADED_INPUT events. (currently only PathMetadataValuearrow-up-right metadata supported (EXPERIMENTAL)

platform_instance

The instance of the platform that all assets produced by this recipe belong to. It is optional

asset_lineage_extractor

You can implement your own logic to capture asset lineage information. See example for details[]

  1. Once Dagster UI is up, you need to turn on the provided sensor execution. To turn on the sensor, click on Overview tab and then on Sensors tab. You will see a toggle button in front of all defined sensors to turn it on/off.

  2. Concur dagster plugin provided sensor is ready to emit metadata after every dagster pipeline run execution.

Dagster Ins and Outarrow-up-right

We can provide inputs and outputs to both assets and ops explicitly using a dictionary of Ins and Out corresponding to the decorated function arguments. While providing inputs and outputs explicitly we can provide metadata as well. To create dataset upstream and downstream dependency for the assets and ops you can use an ins and out dictionary with metadata provided.

Trouble Shooting

Connection error for Concur Rest URLarrow-up-right

If you get ConnectionError: HTTPConnectionPool(host='localhost', port=8080), then in that case your Concur's Data Mapping service is not up.

Last updated