Authentic Data - A Published Dataset - available for use/consumption by 3rd parties is built on data and data governance used to design and build a re-usable dataset from "first principles"
Presentation - Authentic Data - Simple in Principle - Nov 1 DRMWG
Summary
- Authentic Data is data that has been crypto-signed by an "authority" (role) using their private key for which users of the data can verify using the "authority"s public key.
- Creating publishable/sharable data is via a Data Lifecycle where the data is initially captured/collected, then checked for input and consistency errors, cleaned of outliers, duplicates and checked for overall correctness. Each of those stages needs to be persisted and linked to the dataset that is published for 3rd party use that are part of the data provenance (trust) chain
- The Data needs to be designed with respect to structure, metadata, and "fitness for purpose". Governance needs to be designed to ensure accuracy, consistency and correctness
- Governance drives requirements for error, consistency and accuracy as an active part of the data lifecycle. Data Governance is the strategy, Data Stewardship is oversight the data lifecycle
- Discussion at:
- 14 mins: Kevin: Need a definition of "authentic"
- Burak, Neil, James: Data Transformation (source to target) is going to be core22 mins: Burak: Going to need data transformation
- Neil: Definitely on the roadmap for future discussions
- James: Ontologies are heading to a hub model of dataNeil: "super schemas" are out there
- James offered to present his perspective
- Burak offered to present his perspective
- James suggested open debate
- 34 mins: Carly: from a researcher's point of view, the data lifecycle is not a linear process
Data Lifecycle - while you the data lifecycle is laid out as a logical, linear set of stages/steps, our researchers see this as (very) convoluted process of data collection, cleanup, combination and re-combination with other datasets
- 38 mins: Kevin: Root of trust is the authentic part of ACDC for more details ACDC (proposed IETF) specification
- Burak, Kevin: Data/metadata (incl. semantics) are independent of containers or exchange "channels". Need to consider Identity/Identifiers for Data vs. the SSI Entities (e.g., Issuer, Verifier, Holder)
- Neil: (post-meeting) this suggests that the ToIP tech/gov dual stack is two (dual) stacks
- Identity tech/gov
- Data tech/gov
- Neil: Data 42 mins: Burak. Let's separate secure data & semantic layers