Appendix A Methods for structuring and processing data for bioregional assessment impact and risk analysis purposes

A.1 Context

There are a very large number of multi-dimensional and multi-scaled datasets that are used in the impact analysis for each bioregional assessment (BA) including model outputs, and ecological, economic and sociocultural data from a wide range of sources. Part of the approach used to manage these multiple dimensions and produce meaningful results is to adopt a clear spatial framework as an organising principle. While the inherently spatial character of every BA is important and must be addressed, it is also essential that the temporal and other dimensions of the analysis do not lose resolution during data processing. For example, knowing where a potential impact may take place is obviously important, but so is knowing what kind and level of impact and which assets may be affected.

A.2 Overview and purpose

The data are organised into impact and risk analysis databases to enable efficient management. The purpose of the databases is to produce result datasets that integrate the available modelling and other evidence across the assessment extent of the BA. The result datasets are required to support three types of BA analyses: hydrological change analysis, landscape impact profiles and asset impact profiles. These outputs are used in product 3-4 (impact and risk analysis) and displayed on the BA Explorer (a spatial data viewer available at www.bioregionalassessments.gov.au/explorer). They are also available as datasets at data.gov.au.

Given the context and purpose, the impact and risk analysis databases must achieve the following outcomes:

The bulk of analysis queries are run in a professionally managed relational database environment.
The result datasets are delivered in a format suitable for use by the Assessment teams.
Queries are rapidly refined for the Assessment teams.
Automation of queries by pre-running whenever possible to generate a 'bank' of queries.
Continuity of provenance is maintained from the repository through the impact and risk analysis databases and the BA Explorer (www.bioregionalassessments.gov.au/explorer).
Result datasets are available for rapid viewing across multiple media including via web feature services (WFS) and the BA Explorer web interface.

A.3 Data structures

The spatial framework underpinning the impact and risk analysis databases requires knowledge about the:

structure of the attributes and tables to enable secure, efficient querying in a relational database environment
characteristics of the impact analysis spatial datasets
characteristics of the technical geoprocessing datasets needed to underpin the spatial framework for the impact analysis
standards for the spatial framework (e.g. coordinate system and naming conventions).

Each of these is addressed in turn in the following sections.

A.4 Data structures for efficient geoprocessing

The data are structured to overcome the slow geoprocessing operations typical of complex queries of very large spatial datasets, such as those required for a BA. This structuring is achieved by (i) loading as many attributes as possible in relational tables, including some spatial information such as area and length data and (ii) simplifying and partitioning the remaining spatial data using 1 km x 1 km assessment units while, importantly, retaining spatial geometries below the resolution of the assessment units. An assessment unit is a geographic area represented by a square polygon with a unique identifier. The assessment units are non-overlapping and form a grid that completely covers each assessment extent. The spatial resolution of the assessment units is closely related to that of the BA groundwater modelling and is, typically, 1 km x 1 km. Assessment units are used to spatially partition asset and landscape class spatial data for impact analysis purposes. The partitioned data, including the model results, may be combined and recombined into any aggregation supported by the conceptual modelling, causal pathways and model data. The assessment units are used to summarise and present potential changes in the hydrological response variables and the receptor impact variables.

The assessment units enable fast querying and display of the spatial data as most of the querying is completed in the relational database rather than through geoprocessing operations.

A.5 Impact analysis datasets

The impact analysis datasets are outlined in Table 9, which describes their relevant characteristics.

Table 9 Impact analysis datasets

Datasets	Data characteristics
Landscape classes	Usually a dataset of non-overlapping polygons that cover the entire assessment extent. Each landscape class has a unique identifier. The entire layer is 'split'.
Assets (and Elements)	Assets are provided to the analysis collected from a wide variety of sources as part of creating product 1.3 (description of the water-dependent asset register) and maintained in the bioregion or subregion assets database. Not all attribute information of the assets database is required for the impact and risk analysis. The entire layer is ‘split’ against the assessment units of the bioregion or subregion.
Groundwater modelling	A regular assessment unit grid, which is nominally 1000 x 1000 m (exceptions for GLO 500 m and MBC 1500 m). SW HRV attributes interpolated from source models are 'joined' (linked) to the regular grid cell geometry. Each BA has a regional watertable drawdown layer and some have additional model layers at other depths. The resolution is estimated to incorporate the uncertainties in the modelling.
Surface water modelling	A link-node (line and point) spatial structure interpolated from source models and, typically, based on the Geofabric Network streamlines. SW HRV attributes are ‘joined’ (linked) to the link-node geometry. There are nine ‘standard’ HRVs and six to ten additional HRVs specifically produced to support the receptor impact modelling (RIM).
Coal resource development footprints	The spatial locations of mining activity considered in the bioregional assessment. The entire layer is ‘split’ against the assessment units of the bioregion or subregion.
Boundaries	Assessment boundaries of the bioregion including subregion boundary, preliminary assessment extent, assessment extent and analysis domain
Zone of potential hydrological change	The zone of potential hydrological change allocates a one-to-one mapping between assessment units, reporting regions and surface water modelling links. The mapping contained within the zone provides the assessment connection between all datasets used by the impact and risk analysis database.

BA = bioregional assessment; GLO = Gloucester subregion; HRV = hydrological response variable; MBC = Maranoa-Balonne-Condamine subregion; SW = surface water

A.6 Geoprocessing datasets

The geoprocessing BA datasets, including their relevant characteristics and rationale for inclusion as a geoprocessing dataset, are outlined in Table 10.

Table 10 Geoprocessing datasets

Datasets	Data characteristics	Rationale
Assessment units (AU)	Regular grid cells at nominally 1000 x 1000 m (exceptions for GLO 500 m and MBC 1500 m) Precisely aligned to the groundwater model grid cells (see below) Unique identifier for each assessment unit.	The assessment units enable a common spatial structure for all datasets at the resolution of the regional-scale groundwater modelling. The assessment units enable the efficient linkage and transfer of data between the analysis datasets.
Blue line links and nodes	A link-node (line and point) spatial structure interpolated from source models and, typically, based on the Geofabric Network streamlines Precisely aligned to the surface water link-node dataset.	The blue line links and nodes enable linkages between the surface water modelling and the receptor impact modelling with other analysis datasets.

A.7 Spatial framework standards

The standard projected coordinate system for the impact and risk analysis databases is the standard Australian Albers (i.e. using the 132 meridian, EPSG 3577) and for the geographical coordinate system (where required), use GDA94 (EPSG 4283). The other BA standard coordinate systems (i.e. the Albers ones based on the 140 and 151 meridians) are for map making and are not affected by this decision.

Table and field naming conventions are an essential part of achieving efficient automation of geoprocessing and other database transactions. Key information that must be captured in the names are: the futures (baseline or CRDP), the hydrological response variables, the receptor impact variables, time periods and variable characteristics (absolute, relative). The naming conventions are detailed in Dataset 1 (Bioregional Assessment Programme, Dataset 1).

A.8 Geoprocessing workflow

The fundamental and impact analysis datasets are prepared for ingestion into impact and risk analysis databases as follows:
1. The assessment unit (AU) unique identifiers and spatial geometry are stamped through the asset and landscape class datasets, effectively 'splitting' the central impact analysis datasets into pieces of data that are 1 km x 1 km. The exceptions are for the Gloucester (500 m) and Maranoa-Balonne-Condamine (1500 m) subregions for reasons explained in the relevant product 3-4 methods sections (in companion product 3-4 for the Gloucester subregion (Post et al., 2018) and for the Maranoa-Balonne-Condamine subregion (Holland et al., 2017) respectively).
2. The area, length or count of individual ‘split’ polygons, lines and points respectively are calculated and added to the datasets.
3. The modelling results are interpolated and summarised, then formatted to a consistent structure including consistent field and table names.
The data must meet certain requirements before it can be loaded into the impact and risk analysis database.
1. The data must meet database schema requirements.
2. The data must be already registered as datasets in data.gov.au to meet provenance requirements.
The data are loaded into the impact and risk analysis database as follows:
1. A preliminary step is to load the data using a method that allows the data to be reloaded if necessary. The method also maintains a record of the loading procedure for provenance purposes.
2. Queries are run to produce views of the data attributes and geometries that are then loaded into the impact and risk analysis database for each BA bioregion or subregion.
Once loaded into the impact and risk analysis database the data are used as follows:
1. They are tracked and attributed to maintain the chain of provenance.
2. They are served to the Assessment teams so they can conduct the impact analysis in their respective GIS environments.

Refinement is made with further queries as required.

Last updated:

7 December 2018