Data Quality written over a word clud Data word cloud

E-zine sign-up

10 questions you should ask to check the quality of your data

Spatial data has an important role within projects, it often provides critical evidence used to inform decision making. However data that is suitable for one application may not be fit for use with another. Here are the 10 questions we pose when assessing the quality of spatial data. 

10 questions we ask when assessing the quality of spatial data.

1. How accurate is the data?
This relies on the notion of truth, and of exact data reliably representing the real phenomenon that it is attempting to represent. One of the best ways to check data accuracy is to compare it against another source. Alternatively, if this isn’t possible undertaking simple analysis of the raw data can highlight anomalies and inaccuracies.

2. Is the data at sufficient resolution and precise enough for its intended use?
Precision refers to the amount of detail that can be discerned, also known as granularity. If the data does not have sufficient resolution, then it is very unlikely to have an acceptable level of precision. A simple check to understand the level of data resolution would be to identify the number of significant digits used in the numbers, particularly coordinate values.

3. Is the data fit for purpose?
This represents the potential of the data to fulfil the user’s specific requirements. It is important to identify a project’s data needs and determine the quality required. For example, if the data is not resolved enough for its intended use then it is not fit for purpose. Undertaking simple analysis before using the data will help to identify if it is fit for purpose.

4. Is the dataset complete?
Does the data contain errors of commission (extra incorrect features) or omission (missing features)? Does the data conform to the expected spatial and/or temporal extent? It is important to determine what level of completeness you are dealing with before you begin using the data.

5. Is it in a format that can easily be consumed and read by the software available?
The most usable format for data is likely to be one in which the dataset was first created. Geospatial data is often more complex than simple tabular datasets. When publishing this type of data, formats like geoJSON (based upon JavaScript Object Notation - JSON) and KML (based upon Extensible Markup Language - XML) should be considered.

6. Does the cost of the data reflect value for money?
Does the data meet budget? There are many freely available data sources from reputable organisations. However, it may be worth investing in a known source, where the data may be of a higher quality than open source datasets. Poor data leads to poor outcomes.

7. Is the data accessible and interpretable by authorised users in a specific context of use?
Licensing limitations might restrict reuse, modifying the data and whether it can be utilised freely within reports. Do you know the licensing time limits?

8. Does the data adhere to known standards, conventions or regulations?
Many industries follow standards that are reflected in a geospatial data model as value domains, data formats, and topological consistency of how the data is being stored. A good example of this is the MEDIN Data Guidlines, International Organization for Standardization (ISO) or the OGC (Open Geospatial Consortium).

9. Do you know the lineage of the data?
A thorough description of the history of the dataset allows you to determine its potential use. Lineage provides information such as: data source including information on the organisation providing it, coordinate systems, projection systems, associated corrections etc.; methods of acquisition, derivation or compilation of the data; methods of data conversion such as stages in digitization / vectorization of raster data; transformations e.g. coordinate transformations, reclassification etc. Good lineage will be stored within the metadata or accompanying report.

10. Does the data contain complete metadata to a known standard?
Metadata is ‘data about the data’ and it’s vital to understanding the source, currency, scale, and usage appropriateness. Metadata enables the data to be better understood and used to good effect.

Prepared by Oliver Ringwood, ABPmer GIS Specialist

If you have any questions regarding data quality assurance or data management, contact Chris Jackson.