Set up a Checkpoint
A Checkpoint is a step in a data pipeline that runs an Expectation Suite against a batch of data. New validation results are created each time a Checkpoint validates a batch of data. Checkpoints can also be configured to execute Actions (e.g. send an alert message through Slack or email when a validation fails).
We are going to run our Expectation Suite against the February data in our data
folder. In order to do that we need to set up a Checkpoint. In your terminal run the following command from your gx-getting-started
directory (make sure to shut down any running Jupyter Notebooks first):
great_expectations checkpoint new getting_started_checkpoint
A new browser tab will open with a Jupyter Notebook and the edit_checkpoint_getting_started_checkpoint.ipynb
file inside.
File Location
The file to edit your Checkpoint is located at
gx-getting-started/uncommitted/edit_checkpoint_getting_started_checkpoint.ipynb
.
This file contains the configuration code for a Checkpoint. Scroll down to the second code cell. You should see the following code:
my_checkpoint_name = "getting_started_checkpoint" # This was populated from your CLI command.
yaml_config = f"""
name: {my_checkpoint_name}
config_version: 1.0
class_name: SimpleCheckpoint
run_name_template: "%Y%m%d-%H%M%S-my-run-name-template"
validations:
- batch_request:
datasource_name: getting_started_datasource
data_connector_name: default_inferred_data_connector_name
data_asset_name: yellow_tripdata_sample_2019-02.csv
data_connector_query:
index: -1
expectation_suite_name: getting_started_expectation_suite_taxi.demo
"""
print(yaml_config)
Make sure that the data_asset_name
is set to yellow_tripdata_sample_2019-02.csv
so we can validate the February data. Now scroll down to the last code cell and uncomment both lines of code. Then run all the cells in the notebook, which will create the Checkpoint configs in your project.
This will open the Data Docs again in a new tab.
Inspect the Validation Results for your Checkpoint
You should see that the first result listed under the “Validation Results” tab shows a failed Expectation Suite (i.e. there is a red “X” in the “Status” column). If you click on that row you will see more details about the failure.
You should see that the Expectation for "values must be greater than or equal to 1
and less than or equal to 6
" failed. But why?
In the results of that Expectation you will see a small table with a heading "Sampled Unexpected Values" with a 0
below it. That means that the passenger_count
column contained the value 0
. Since that Expectation was configured to only allow values greater than or equal 1 and less than or equal to 6 for the passenger_count
column, the validation failed. (A taxi ride with 0
passengers does not make sense.)
Top comments (0)