7.4. Address ingestion via the ETL process
|
The background colors of the above image are to be interpreted as:
|
The ETL process that is responsible for ingesting data from one or multiple trusted external Address Data Providers, is a key element of the NRVC. The ETL is responsible for:
- Ingest new addresses present on the Address Data Providers.
- Correct existing addresses
- Validate addresses ingested by the Data producers
You will find below a high level representation of the ingestion process. The exact algorithms used by the ETL process are out of the scope of this project and therefore are abstracted as a black box in the process description
The ETL algorithm is out of scope of this project
ETL Process
| Name | Address ingestion via the ETL process |
|---|---|
| Purpose | Ingest and validate address data to maintain the quality of the NRVC’s address database database to the highest standards |
| Linked user stories | |
| APIs used | GET /etl/addresses POST /etl/addresses GET /etl/addresses/<address-id> PUT /etl/addresses/<address-id> PATCH /etl/addresses/<address-id> |
| Scope | This process handles the ingestion of addresses into the NRVC’s address database. It also handles the correction and validation of existing address data. This exact algorithm used by the ETL process is out of scope of this process. |
| Roles | ETL, System |
| Input | - Addresses from the Address Data Providers - Algorithm for the address data consolidation |
| Output | - Consolidated and up to date NRVC address database |
Detailed Process description
Main process
| Step | Description | Actor(s) | Input(s) | Output(s) | Decision points |
|---|---|---|---|---|---|
| 1 | The ETL process is periodically triggered | ETL | - | - | |
| 2 | The ETL process retrieves the data to synchronise from the Address Data Providers | ETL | - | - addresses to synchronise | |
| 3 | The ETL process process one address from the addresses to synchronise (could be done in parallel) | ETL | - addresses to synchronise | - next address to synchronise | |
| 4 | The ETL process extracts the address information to be stored in the NRVC address database | ETL | - address to synchronise | - address information to be ingested | |
| 5 | The ETL process searches for address matches in the NRVC address database | ETL | - address information to be ingested | - address present in the NRVC address database if any | |
| 6 | The ETL process checks if the address is present in the NRVC address database | ETL | - address present in the NRVC address database if any | - yes / no | If the address is present: Go to step 7 Else: Go to secondary process S.1. |
| 7 | Correct address information and set the flag “validated = true” | ETL | - NRVC address | - Corrected NRVC address with flag “validated = true” | |
| 8 | System applies the address update | System | - Corrected NRVC address with flag “validated = true” | - Corrected NRVC address with flag “validated = true” | |
| 9 | The ETL process checks if more addresses need to be synchronised | ETL | - addresses that still need to be synchronised | - yes / no | If there are still addresses to be synchronised: Go to step 3 Else: Go to step 10 |
| 10 | The ETL process terminates successfully | ETL | - | - |
Secondary Processes
S.1. Address does not exist in NRVC address database
| Step | Description | Actor(s) | Input(s) | Output(s) | Decision points |
|---|---|---|---|---|---|
| 1 | The ETL process creates a new Address with the flag “validated = true” | ETL | - address information to be ingested | - NRVC address to be created | |
| 2 | The system creates the given address | System | - NRVC address to be created | - NRVC created NRVC | Go to Main process step 9 |
Additional Information
Error processing during the ETL process
If an error occurs during the ETL process (internal error, or error while using an external API), the system should log the error and process the next address. An error triggered during the processing of one address should never interrupting the ETL process for subsequent addresses, except if it is a system wide error, that would prevent all addresses from being processed.


No comments to display
No comments to display