6.5. Data quality

Inconsistent formats

Definition

Variability in formats, like dates or numbers, can lead to misinterpretation or processing errors. For instance, a date might be entered as "dd/mm/yyyy" in one place and "mm-dd-yyyy" elsewhere.

Implementation

Duplicate data

Definition

Duplicates often occur when data comes from multiple sources or overlapping imports. These duplicates can inflate datasets and introduce biases if not properly managed.

Implementation

6.9. Manual reviews and audits

Missing or incomplete data

Definition

Empty fields or missing essential information reduce the dataset’s completeness. This may result from data entry errors, system limitations, or gaps in data collection processes.

Implementation

Inaccurate or incorrect entries

Definition

Errors from manual entry or measurement inaccuracies can lead to faulty data. This includes misspelled names, transposed digits, or incorrect values.

Implementation

Inconsistent data standards

Definition

Lack of adherence to common standards (e.g., different units of measurement, terminology variations) makes it challenging to compare or aggregate data across sources.

Implementation

Poorly defined data

Definition

Ambiguous labels, unclear fields, or undefined variables can limit the data's usability by complicating its interpretation.

Implementation

Lack of data integrity

Definition

Missing links between related records or absence of primary keys in relational data can fragment datasets, making cohesive analysis difficult.

Implementation

Incorrectly classified data

Definition

Mislabelling categories or misclassifying items within datasets can skew analysis. For example, categorising a purchase as "corporate" instead of "personal" could mislead marketing analysis.

Implementation

Limited number of free text fields

Definition

It was decided to limit the number of free text fields to the minimum. Each free text field is defined in the data model alongside its justification and approval.

Implementation

Field validation rules

Definition

Every field has a well defined type and where possible an associated validation rule that limites the valid values that can be inputed. All mandatory fields are marked as such in the data model and the validation process will enforce these rules and return an error if any required fields are missing.

Implementation

Automatic validation processes

Definition

On top of the validation rules at the field level, where possible automated validation processes are put in place to detect and prevent invalid inputs.

E.g. If two equipments are too far away from each other to be connected by a cable, that physical link is not allowed by the system.

E.g.2: If an address is being created that already exists the system will return an error

E.g.3: If an address is created but similar addresses already exist (typo) the list of similar addresses is return for validation before proceeding with the creation of the new entry.

Implementation

6.8. Automatic data approvals and deletion

Manual Reviews

Definition

Even though automated processes are in place to ensure high quality standards, manual reviews of the data by experts can help identify and correct errors that automated systems might miss.

Implementation

6.9. Manual reviews and audits

Cross-referencing

Definition

Comparing data from multiple sources can help identify discrepancies and validate the accuracy of the data. Cross-referencing can be particularly useful for ensuring the consistency and reliability of data.

Implementation

Approval processes

Definition

Since the data ingestion process is 100% manual (usually performed by technicians on site) we need to consider the human error factor. From the discussions with the operators, the data produced by field technicians is considered as being highly qualitative and is fully trusted.

Nonetheless, human error can occur, therefore we created an approval process that redirects data ingested by field technicians to an Approver from his organisation that can perform sanity checks on the produced data records.

Implementation

Address database quality monitoring

Definition

Address ingestion into the NRVC database follows a process that is designed to keep the address database accurate and up to date. This process consolidates data from various datasources and approves addresses submitted by Editors.

Since the addresses submitted by the Editors are not considered as valid / approved but need to be validated by the ingestion process at a later stage, it could happen that some addresses are invalid and never get validated.

To cover such cases, a monitoring will be set up. This monitoring will configured to detect address entries that have not been validated within a reasonable amount of time. When such entries are detected, the Application Administrators and / or Approvers will receive an alert indicating that the entry needs to be manually validated.

Implementation

Versioning

Definition

Vertical cabling physical links between two equipments are crucial information whose quality needs to guaranteed. Once a physical link dataset is produced it cannot be deleted anymore, the produced data can then be validated or rejected.

If a problem is detected with a dataset, after it has been validated/rejected, this decision can be undone at a later stage by Administrators, effectively reverting the approved version to the previously approved version.

Implementation

Audit logs

Definition

Audit logs of every action performed on the system are kept. The audit logs are mainly kept for accountability, but can also be used to analyse drops in data quality. Furthermore since all actions performed are stored, the audit logs could be used as last resort to manually correct unintended or malicious actions.

Implementation