Dagster (Roadmap)
Dagster Integration guidelines
Concur's Data Mapping solution now supports seamless integration with Dagster, enabling enhanced tracking and visibility of data workflows. The integration covers several key aspects, ensuring comprehensive metadata and lineage information is captured.
Integration Features
Dagster Pipeline Metadata:
Concur's Data Mapping solution collects and stores detailed metadata about Dagster pipelines, providing insights into the structure and configuration of each pipeline.
Job and Op Run Information:
Detailed run information for each job and operation (Op) within Dagster is captured, including execution statuses and runtime metrics. This data helps in monitoring performance and diagnosing issues.
Lineage Information:
When available, lineage information is extracted and stored, offering a clear view of data flow and transformations within the Dagster pipelines. This ensures transparency and traceability of data movements and transformations.
Concur's Dagster Sensor
The integration leverages Concur's Dagster sensor to monitor pipeline executions and emit relevant metadata. Here's how it works:
State Change Detection:
Dagster sensors are designed to detect specific state changes within pipelines. Concu's defined Dagster sensor triggers actions based on these changes.
Metadata Emission:
After each Dagster pipeline run, the sensor emits metadata, capturing both successful and failed executions. This ensures that all relevant information is collected, regardless of the outcome of the pipeline run.
Comprehensive Coverage:
The sensor captures detailed information about pipeline runs, including execution metrics and statuses, helping users maintain a complete and accurate view of their data workflows.
For more detailed information about Dagster sensors and their capabilities, please refer to the Dagster Sensors Documentation.
Prerequisites
You need to create a new dagster project. See https://docs.dagster.io/getting-started/create-new-project.
There are two ways to define Dagster definition before starting dagster UI. One using Definitions class (recommended) and second using Repositories.
Creation of new dagster project by default uses Definition class to define Dagster definition.
The DataHub dagster plugin provided sensor internally uses below configs. You can set these configs using environment variables. If not set, the sensor will take the default value
Configuration options:
datahub_client_config
The DataHub client config
dagster_url
The url to your Dagster Webserver.
capture_asset_materialization
True
Whether to capture asset keys as Dataset on AssetMaterialization event
capture_input_output
True
Whether to capture and try to parse input and output from HANDLED_OUTPUT,.LOADED_INPUT events. (currently only PathMetadataValue metadata supported (EXPERIMENTAL)
platform_instance
The instance of the platform that all assets produced by this recipe belong to. It is optional
asset_lineage_extractor
You can implement your own logic to capture asset lineage information. See example for details[]
Once Dagster UI is up, you need to turn on the provided sensor execution. To turn on the sensor, click on Overview tab and then on Sensors tab. You will see a toggle button in front of all defined sensors to turn it on/off.
Concur dagster plugin provided sensor is ready to emit metadata after every dagster pipeline run execution.
Dagster Ins and Out
We can provide inputs and outputs to both assets and ops explicitly using a dictionary of Ins and Out corresponding to the decorated function arguments. While providing inputs and outputs explicitly we can provide metadata as well. To create dataset upstream and downstream dependency for the assets and ops you can use an ins and out dictionary with metadata provided.
Trouble Shooting
Connection error for Concur Rest URL
If you get ConnectionError: HTTPConnectionPool(host='localhost', port=8080), then in that case your Concur's Data Mapping service is not up.
Last updated