Geoconnex API

Feedback on requests and responses

Introduction

Geoconnex is a decentralized metadata catalog that incorporates information about data collected by any organization publishing water data, including the spatial, temporal, and topical subjects of that data.

We are developing an API to allow programmatic access to this catalog

for potential new data publishing organizations to enrich their own metadata
for data analysts and tool developers to discover water metadata, like a specialized Google search for water data.

Thank you for participating etc. We are asking for feedback on the API for geoconnex. Geoconnex is a knowldege graph that aims to allow data users to discover relationships between real-world hydrologic features, cataloging features, organizational monitoring locations with data about them, and reference locations that serve as organizational monitoring locations common across multiple organizations.

What I’ll do for you today is introduce the main use cases we imagine for the geoconnex API, and then go through a number of endpoints we are in the early design phases of. This is truly the first draft so your input could have a dramatic effect on the production API. Thus, rather than presenting fully specified API HTTP requests, and json payloads for requests and responses, I will describe the basic structure of the queries we aim to make possible, and the inputs and outputs that we are thinking about implementing.

Introduction

What kind of water data do you work with?

How do you normally find water data?

What, if any, challenges do you have finding water data from particular or unfamiliar sources?

Introduction

Geoconnex itself will be a metadata catalog that includes minimally standardized metadata about all sites (eg gages, wells, dams, public water systems, water quality sample stations, etc.) that all participating organizations (eg federal, state, local, Tribal, NGO, academic) publish data about. In this diagram, we are focusing on the data publisher, reference.geoconnex.us, the data user, and api.geoconnex.us. In the geoconenx system, data publishers publish metadata about datasets that they collect that are about certian common reference features across many data publishers.These common features we call “reference features”. The geoconnex system aggregates the metadata that data publishers provide, and provides centralized access to the resulting metadata catalog. The geoconnex API will allow programmatic access to this metadata catalog. The Geoconnex API will thus provide data discovery services, but not necessarily observation/model data access directly. It will tell users that data on certain subjects exist, and where to find that data, like a library catalog. It will not directly provide the observed values. However, It will point users to relevant datasets with as much useful metadata as possible. First I’ll introduce you to the two high level use cases we imagine this API meeting, and then show you mockups of the API endpoints, their inputs, and responses.

Outline

User Story - Data Publisher
User Story - Data Analyst/Tool Developer
Endpoints, requests, and responses

User Story - Water Data Publisher

“As a water data publisher…

I want to know all locations from all organizations where data is currently being collected that are near my own monitoring locations

So that I know whether to submit reference features to geoconnex.us, and so I know which reference features to tag my own data with when I publish it.

User Story - Water Data Analyst/Tool Developer

“As a water information tool/product developer…

I want to know all locations from all organizations where data types relevant to my questions are collected, and where that data can be accessed

So that I can build my tool or conduct my analysis using as much relevant data as possible

Top-level Endpoints

https://api.geoconnex.us

Services to discover and filter water datasets by:

Space/Geometry	Measurement/Modeling Method
Organization/ data provider	Temporal resolution
Site Type (eg stream, well, dam)	Temporal coverage
Parameter/Variable/Observed Property	Feature of Interest

Top-level Endpoints

https://api.geoconnex.us…

/processes ¹
/locationTypes ²
/FeatureCollections ³
/Features/{catalogingFeatureType} ⁴
/providers ⁵
/observedProperties ⁶
/methodTypes ⁷
/methods ⁸
/timeSpacing ⁹

example codelist: methods

id	provider	name	description	methodType	url
noaa-ahps	ahps	ahps	NOAA Advanced Hydrological Prediction Service	forecast	https://water.weather.gov/ahps/about/about.php
usgs-streamflow	nwis	WSP-2175	Streamflow measurement and computation	in-situ observation	https://pubs.usgs.gov/wsp/wsp2175/wsp2175.pdf
usgs-groundwater_level-tape	nwis	GWPD1	Measuring water levels by use of a graduated steel tape	in-situ observation	https://pubs.usgs.gov/tm/1a1/pdf/GWPD1.pdf

Processes

Data discovery tools will follow the OGC-API Processes standard, and thus we refer to them as processes. The following tools are proposed:

process name	description
referenceMyLocations	POST a geospatial dataset of your own locations, find candidate geoconnex reference locations that may correspond to them already
findFeatures ¹⁰	GET or POST query parameters to find features of relevance
navigateFeatures ¹¹	GET all hydrologically relevant monitoring features for a given longitude and latitude or feature identifier
findDatasets ¹²	POST an array of feature identifiers and an array of query parameters to find relevant dataset metadata]

/processes Data publisher use case

As a data publisher, would use referenceMyLocations to see if any existing locations in the database exist that are likely the same site/ real-world object as sites I want to publish data about. I could then add links in my metadata that I publish to ensure others know I have data about the same site.

referenceMyLocation Inputs:

input data: a geospatial dataset with at least an ID field
locationType: from /locationType codelist (eg “gage”, “well”, “dam”)

/processes Data publisher use case example

POST to /processes/referenceMyLocations

Input:

locationType: “gage”
input data:

/processes Data publisher use case example

POST to /processes/referenceMyLocations

Output:

spatial dataset with fields: uri, name, input_id, match_distance_m

/processes Data user imagined workflow

use findFeatures discover locations within a bounding box, radius around a point, or arbitrary spatial polygon, or relevant to reference feature (eg a river or aquifer) or cataloging feature (eg a an administrative boundary or HUC)
- receive a geospatial dataset, including attribute id populated with geoconnex http URIs (e.g. https://geoconnex.us/usgs/monitoring-location/1000001)
If desired, use navigateFeatures to find all sites downstream and/or upstream of a site identified by a geoconnex http URI or a latitude/longitude
use findDatasets, given a list of URIs and query parameters for provider, observedProperty, period of record, etc., be given links to datasets relevant to your question that are about the locations found by findFeatures and/or navigateFeatures calls

/processes/findFeatures use case

As a data user, I would use findFeatures to discover all real-world features that organizations of interest have published metadata about, fitltered to only my area of interest

/processes/findFeatures inputs

input	number of elements	example or description
locationTypes	min:0, max: inf	`["gages", "wells"]`
providers	min:0, max: inf	NULL ¹³, or `["nwis","rise"]`
catalogingFeatures	min:1, max: inf	`{"hu02:"14", "state":["CA","AZ"]}`
hydrologicFeature	min:0, max: inf	`["https://geoconnex.us/ref/mainstems/1"]`
polygon	min:0, max:1	any multipolygon or polygon geometry
bbox	min:0, max:1	xmin, xmax, ymin, ymax
radius	min: 0, max: 1	lat, lon, distance in km

/processes/findFeatures output

geospatial dataset of all locations meeting input field criteria, with fields:

uri eg https://geoconnex.us/foo
name (by provider) eg
- name: "colorado river at bridge x", provider: "nwis"
- name: "station WQX1234", provider: "storet"
locationType (eg "stream")
Cataloging Features uri for every catalogingFeatureCollection eg:
- hu02: https://geoconnex.us/ref/hu02/14
- county: https://geoconnex.us/ref/counties/06025
all relevant Hydrologic Features eg: mainstem: https://geoconnex.us/ref/mainstems/29559

/processes/navigateFeatures use case

As a data user, I would use navigateFeatures to discover all real-world features that organizations of interest have published metadata about, that are hydrologically relevant (eg upstream or downstream of) to a point that I am interested in.

/processes/navigateFeatures inputs and outputs

inputs

uri or pointLocation (lat/lon)
upstream: null, tributaries, or mainstem
downstream: null, diversions, or mainstem
distance: integer (km)
locationType: text¹⁴
provider: text¹⁵
catalogingFeatures: array of uri¹⁶

outputs

mainstem geospatial feature
tributaries geospatial feature collection
diversions geospatial feature collection
relevant geospatial dataset, with attributes:
- uri: https://geoconnex.us/foo
- hydro_relation:
  - “upstream mainstem/tributary” or “downstream mainstem/diversion”
  - distance_km: integer
- all attributes as from findFeatures

/processes/navigateFeatures example inputs

inputs:

lat: 35.45 lon: -105.14 upstream: tributaries downstream: mainstem distance: 3000 catalogingFeatures: https://geoconnex.us/ref/states/35

/processes/navigateFeatures example outputs

outputs:

/processes/findDatasets use case

As a data user, I would use findDatasets to discover all datasets about topics I am interested about, that are relevant to sites a set of sites in the geoconnex system that I am already interested in.
Inputs: a list of geoconnex URIs from findFeatures and/or navigateFeatures, and query parameters to filter datasets relevant to my topic of interest
Outputs: a table or array of dataset metadata, including which geoconnex URI datasets are relevant to.

/processes/findDatasets inputs and outputs

inputs

uri: array (many URIs)
providers: null(all) or array of 1 or more providers
observedProperties: null(all) or array of 1 or more observed Property codes
methodTypes or methods¹⁷
timeSpacing: the maximum (lowest resolution) timespacing desired
minYear: the earliest year you want returned datasets to have coverage over
maxYear: the latest year you want returned datasets to have coverage over

outputs

An table of dataset metadata with the following fields

url: url for where each dataset lives
about_uri: the URI for the geoconnex location each dataset is about
provider: code from /providers
observedProperty: code from /observedProperties
methodType: code from /methodType
method: code from /methods
timeSpacing: code from /timeSpacing
minYear: integer
maxYear: integer
conformsTo: a url to a website describing the API, data model, or data dictionary relevant to the specific dataset

To reiterate, our idea was that you would use one or both of findFeatures and navigateFeatures to get a list of URIs. You could then pass those URIs and the dataset query parameters to retrieve a list of datasets. Having already procured information about the locations, you would input desired dataset attributes including observed property, methods, and temporal coverage and resolution. The set of metadata you would retrieve would include that information, including which URI the dataset is about, as well as a special property, conformsTo, which would link out to a web resource that descirbes how to use that dataset. Since geoconnex is an index, it is open to all source formats. Depending on the dataset, conformsTo might point to an API or data model documentation page, a data dictionary file, or narrative documentation of the dataset.

Do you ahve any questions or concerns about the inputs and outputs of the FindDatasets Process?

Thinking back on the whole workflow, now that you’ve seen the design from end to end, any additional questions, concerns or suggestions?

THANK YOU

Feel free to continue to peruse this presentation

Send any additional feedback to konda@lincolninst.edu

Footnotes

an array of API endpoints in the OGC-API Processes standard for more specific queries that may use the below as query parameters
a nested json array of types of locations available and their definition for each (e.g. [{“id”:“dam”, “name”:“dam”, “description”: “a structure creating an impounded body of water on a stream”)},{“name”:“stream”, “description”: “a flowing body of water on the surface”}])
types of hydrologic (eg river, aquifer) and cataloging (e.g. HUC, catchment, county, state, municipality) features that data may be about or relevant to
an array of the names and identifiers the reference features of a given type
an array of identifiers, names, and URLs for data publication systems and their parent organizations (eg, nwis, National Water Information System, waterdata.usgs.gov, usgs.gov)
an array of identfiers, names, and provider identifiers for observed properties (also often known as parameters, variables, data types) eg {“id”:“inflow_lake”, “name”:“Lake/Reservoir Inflow”,“provider”:“nwis”}.
an array of identifiers, names, and definitions for broad categories of methods and their definitions (eg “id”:“obs”,“name”:“in-situ observation”,“definition”:“observation from an in-situ sensor or sample from a site visit”). Also include “remote sensing”, “estimation”, “simulation model”, “forecast”, “statistical summary”.
an array id ids, names, descriptions, and links to documentation for specific data production methods/sensor eg {“id”:“noaa-ahps” ,“provider”:“noaa”, “name”:“ahps”, “description”:“NOAA Advanced Hydrologic Prediction Service River Forecast Model”,“methodType”:“forecast”, “url”: “https://water.weather.gov/ahps/about/about.php”}
an array of time spacings of datasets available from sites eg [“unknown”,“intermittent”,“discrete”,“event”,“1 second”, “15 minute”, “1 day”, “1 year”]
similar to NWIS Site Service
similar to NLDI or upstream/downstream EPA RAD/WATERS
similar to whatNWISdata function from dataRetrieval
retrieves from all providers
to pre-filter results by location type
to pre-filter results by provider
useful if you want to restrict hydrologic navigation to certain non-hydrologic boundaries
to avoid conflicts between specifying a certain in-situ method and the methodType ’forecast”, for example