Concepts & Terminology Working Group (Proposed)

Mission and Scope

The mission of the C&T WG is to address the needs of ToIP stakeholders for conceptual models and terminology that will maximize the understandability, interoperability, usability of the ToIP stack and digital trust infrastructure, applications, and ecosystems built on top of the ToIP stack.

Conveners (add your name if you are interested to become one of the convenors)

<we need a convenor-lead name here>
Oskar van Deventer
Rieks Joosten

Interested Members (add your name if you may be interested in joining this proposed WG)

Drummond Reed
Daniel Hardman
Scott Perry
Shashishekhar S
Philippe Page
Paul Knowles
Taylor Kendal
Scott Whitmire
Arjun Govind
Vinod Panicker
sankarshan
Steven Milstein

Description

Context

Unlike most Linux Foundation projects, the primary focus of the ToIP Foundation is not just on technology (e.g. cryptography, DIDs and other identifiers, communication protocols, verifiable credentials, etc.). Instead, our focus is just as much on governance, including the business, legal and social aspects. It is a complex and daunting mission to "construct, maintain and improve a global, pervasive, scalable and interoperable infrastructure for the (international) exchange of verified and certified data". This is an engineering task that must not only provide the technology, but also actual business value and capabilities for complying with different legal contexts and societies. The main difficulty ToIP seeks to overcome is the integration between all of these domains.

C&T WG Mission

A well-known impediment for this integration is "language confusion". Many stories about this exist in various cultures. A more contemporary acknowledgement of this is the architecture of the EU parliament building in Strasbourg, which resembles the Tower of Babel (according to Pieter Brueghel's famous painting).

The mission of the proposed C&T WG is to identify and address the issues that relate to ways of thinking (mental/conceptual models) and terminology that may be an impediment for the overall mission of the ToIP Foundation.

C&T WG Purpose

In a similar way that the ToIP stack must be fit for purpose, the deliverables of this WG must also be fit for purpose. This means the WG needs to understand what stakeholders need to do with the mental/conceptual models and terminology this WG will govern.

Specifically, the WG shall create and maintain a list of stakeholders, their objectives, the issues they face regarding concepts/conceptual models and terminology, and the products or services they might use for resolving such issues. Understanding these end products or services will provide the requirements that the deliverables of this WG must satisfy. It will help us decide what tasks to undertake, guide us with tool selection, etc.

This is especially important because we envisage stakeholders will come from very different domains—technical, business, legal, policy, marketing—each of which has different needs. We will need to reconcile the needs of these different groups into a minimum number of artifacts to be produced and maintained.

Concepts, Terminology and Scopes

The most basic purpose for having conceptual models (i.e. sets of carefully defined concepts, relations between concepts and constraints that should be satisfied) and terminology (formal labels for such concepts) is to help someone that interprets a term (interpreter) that is uttered by someone else (speaker), to accurately apprehend its intended meaning. This is particularly important in settings where groups of such individuals work together for a specific set of purposes, or to realize common objectives (we will use the term 'scope' to refer to such groups). For these groups to work efficiently and effectively, the ideas they work on together with must be aligned. This is particularly valuable in (software) engineering, where 'interpretation errors' (mis-apprehensions) that go undetected may lead to buggy software and costly repairs.

It is common knowledge that every scope can have its own terminology (jargon). ToIP is about about interoperability between people from different domains (hence also different scopes) - e.g. legal, business, technical, social and other domains. Therefore, it is crucial to find a way each scope can use its own terminology, yet be able to determine whether or not a concept referred to by one term in one scope is the same as a a concept referred to by another term in another scope.

For example, this situation is quite common in court cases. Arguments are regularly made about whether or not someone or something qualifies as (an instance of) some concept. The outcome of this discussion is relevant, because laws assign consequences (duties, rights, ...) to those that do or do not qualify. In order to be able to refer to such concepts, the first section of many legal documents defines the mapping between terms and the criteria that are used to determine whether or not someone or something qualifies as an instance of a concept. This way, even if different laws have different terms for the same concept, it is not an issue if legal documents make these mappings explicit. The definitions are written in such a way that judges and other lawyers should all apply them in the same way (in case of disputes, judges settle the intended interpretation). The processes for defining and using such legal terms serve as an inspiration for our work.

Contributions and Example Outputs

For our purposes, we leverage a prior collaboration between Daniel Hardman of Evernym and Rieks Joosten of TNO as well as large parts of the ToIP Glossary WG proposal from Dan Gisolfi at IBM (see the details in further sections below). A model for some of the deliverables of this WG is one or more websites that would resemble the Legal Dictionary. This site not only provides a definition of various terms, but also a brief description of their backgrounds, various use-cases that exemplify the relevance of (and distinctions made by) the terms, and other useful information.

C&T WG Tasks

The envisaged tasks of this WG consist of

creating and maintaining a list of stakeholders, their objectives, the issues they face regarding concepts and terminology, and the products or services that they might use for resolving such issues.
specifying, creating and maintaining a product framework, inspired by the Legal Dictionary, that can include content helpful to understanding specific concepts and terminology.
specifying and operating a process for maintaining and improving the contents of that framework (a proposal for which is under construction).
specify other products/services that the WG will provide, or organize to be provided, so as to help stakeholders to address their issues.
specifying and operating processes for maintaining (the contents of) such products/services.

Governance

The C&T WG will have a governing committee (GC) that shall:

oversee the work that is being done to further/fulfill its mission as described above, which consists at least of the following:
1. function, at least initially, as a modeling committee (see below) for the mental model on 'Mental Models', and how to CRUD (create, read, update, delete) them.
2. define the artefacts that constitute the 'body of knowledge' (BoK) of the C&T WG, which document all mental models and terminology that the WG governs.
3. define artefacts that may be considered for CRUDding the BoK of the C&T WG mental models, as well as criteria that must be satisfied in order for such artefacts to be considered.
4. define the process that considers such artefacts (that satisfy the criteria), and produces a decision saying whether or not to update the BoK, and if so, how the BoK will be updated.
5. define concrete artefacts (e.g. web pages, ...) that are to be generated from the BoK. Such artefacts are part of the results/products that the C&T WG aims to produce.
6. ensure that technology is available that automatically generates and updates such artefacts from the BoK (particularly as it is changed).
connect/liaise to the other WGs within ToIP as well as groups/organizations outside ToIP, e.g. relevant W3C groups, DIF, etc. for the purpose of furthering the work of both the ToIP C&T WG and that of the other groups, insofar that is considered useful.

The C&T WG will have at least one modeling committee (MC) that shall:

oversee the construction and maintenance of mental models that are relevant within ToIP, consisting of carefully defined concepts, relations between them and the constraints involved.
choose labels for these models that are appropriate within the scope of the MC's own work.
associated each mental model with stories (visions, use-cases, ...) that explain it, identify pitfalls, etc. in terms of what is understandable in other domains (e.g. legal, business, social, and so on), using labels that exist in that context if they nicely map onto concepts or relations, or otherwise introducing/suggesting other labels.

Working Group Charter

Develop and maintain a high-quality corpus of terminology (CoT) that covers the needs of the ToIP community.
Develop a process whereby this corpus can be:

Curated, based on evidence and using expert opinion, such that concepts, relations between concepts and constraints can be
1. carefully defined,
2. assigned an identifier (name/number/label) to distinguish it from any other concept in the corpus,
3. mapped onto terms that are defined and/or commonly accepted in various relevant domains/contexts,
4. their usage and relevance documented from organic sources,
5. their status adjudicated into working, preferred, accepted, superseded and deprecated
Enhanced in a collaborative, open, and fair manner by interested community members.
Versioned.
Published in different ways (e.g. as a glossary, concept map, use-case stories ...), for specific purposes (e.g. education, reference, , ...) by different means (e.g. a PDF, a website, presentations/webinars, ...) and as needed by different audiences/stakeholders or domains (e.g. business domains, architectural domains, ...)
Promoted as a valuable public resource and an influence for convergence and excellence.

Train and organize volunteers so the initiative develops sustainable long-term momentum.
Disseminate/promote the work accross ToIP WGs, and other relevant audiences.

Requirements

The Corpus of Terminology MUST have:

Source control and build processes managed in github.
A well defined syntax for contributing concepts/relations, and for each of them an identifier by which it can be identified within the scope of the Corpus.
A well defined syntax for attributing terms to such (established) concepts/relations for specific contexts/domains.
A well defined CI/CD process that includes auto sorting of terms and concepts. (??? RJ: I'm not sure what this means.)
A simple process for contributing further content.
A simple publicly accessible website, containing at least the Corpus-identifiers and their definitions, possibly inspired by the 'Legal Dictionary'.
A PDF document for every published version, containing at least the Corpus-identifiers and their definitions.

The Corpus MUST NOT have:

A skill requirement on programming knowledge as that will reduce contributors.

The Corpus SHOULD be:

Reusable and easy to leverage in TIP repos.
Usable for language translation via separate self-organized language specific repos. These repos should be aggregators of the baseline glossary and any TIPs.
Usable for mapping its identifiers/terms to those in use in other contexts/domains.
Consumable at the RAW content level (.md files) by external groups who wish to render content in a different manner.

Solution Approaches

We SHOULD:

Use a github repo to manage the corpus.

Consider using a Creative Commons license instead of an Apache license; it may be more appropriate.
Require DCO/IPR for contributors to the repo. Anybody who complies with the DCO/IPR requirements can submit to the corpus by raising a PR.
No need to manually maintain metadata about who edited what, when. We have commit history and git praise/blame.
Use github issues to debate decisions about term statuses. Anybody can raise an issue.

Use existing pervasive opens source documentation tools such as mkdocs, Docusaurus, or GitHub Pages:

Each concept is described in a separate markdown doc that conforms to a simple template (see below). Concepts link to related concepts.
Each term is a separate markdown doc that conforms to a different simple template (see below again). Terms label concepts; links from concepts to terms remain implicit in the markdown version of the data, to avoid redundant editing. Having terms and concepts as separate documents that cross-link allows for synonyms, antonyms, preferred and deprecated and superseded labels for the same concept, localization, and so forth. They also allow for the peaceful co-existence of multiple terminologies (= sets of terms, namespaces, …)
Each context glossary is a separate markdown doc that conforms to another different simple template (see below once again). A glossary is an alphabetic list of terms relating to a specific subject, or for use in a specific domain, with explanations. The markdown document specifies the scope of the glossary, and the selection criteria for terms.
Provides extendable CI/CD pipeline for the repo, and write unit tests to enforce any process rules, quality checks, and best practices the WG adopts.
CI/DI process should enable live website and refreshed PDF document after each approved and merged PR.

Define the criteria for giving a term the statuses. What are grounds for saying it is deprecated, superseded, etc. (Criteria are published in a doc in the repo, so debating changes to criteria means a PR and github issue.)
Create a release process guidelines.

Define difference between live glossary and a “blessed version”. Suggest once per quarter, with names like “2019v1” (where 1 is a quarter). This format is not semver-compatible, because we have no need to wrestle issues of forward and backward compatibility--but it is easy to understand, parse, and reference in a URI.

Establish a ToIP website level access experience

Access to main Glossary in all language versions
Access to TIP Glossaries

We MAY:

Leverage existing CI/DI approaches (sample code repos) for incorporating mkdocs, Docusaurus, or GitHub Pages.
Suggest to the tech WG that they may write a generator tool that walks the repo, building in memory a semantic network of concepts that are cross-linked to terms, and emitting various incarnations of the content:

Browsable static html that’s copied to a website, glossary.decentralized.foundation. The website should be indexed by Google and have search based on elasticsearch.
A .zip file of the static html that could be copied to other web sites.
An ebook format (e.g., epub).
Possibly, occasionally, a JIT-printed SKU published on kdp.amazon.com.

Create a crawler process that collects terminology from various sources (contexts), for the purpose of mapping terminology as is used and/or defined in that context onto the concepts/relations in our Corpus
Create a process for pulling new content (terms, concepts) from the MM_WG

A source is declared in a config file that’s committed to the repo. This means anybody can propose a source by submitting a PR and debating its validity in a github issue.
Sources could include W3C Respec docs, IETF RFCs, Aries RFCs, DIDComm specs hosted at DIF, etc. Corporate websites wouldn’t work because A) they’re too partisan; B) they’d require random, browser-style web crawling, which is too hard to automate well.
Crawler pulls docs and scans them, looking for regexes that allow it to isolate term declarations, their associated definitions, and examples that demonstrate their usage.
Output from crawler is a set of candidate terms that must be either admitted to a pipeline, or rejected, by human judgment. Candidates that are already in the corpus are ignored, so this just helps us keep up to date with evolving term usage in our industry.

Content Templates

Concept Template

Concept ID: 12345 (this is a 5-digit number that’s embedded in the filename, such as c-12345.md)

Criterion

en text: <text that allows the reader to evaluate whether or not something qualifies as an instance of the concept in every (yes, every) relevant use-case>

Definition

en text: blah blah blah

<other language code> text: lorem ipsum cu prorat

links to media (diagrams, audio, video)

Links to any discussions in github issues

Notes

history and theory of the concept in its larger mental model

implications

Related Concepts

Term Template

Term: faster than light

Short form:

Acronym: FTL

Language: en

Labels concept: c-12345 (filename for this term would be t-12345.x.md, where 12345 comes from the concept, and x is 1-3 digits that uniquely identify the term in the context of its concept)

Links to any discussions in github issues

Notes

metaphors or mental/conceptual models (or namespaces) that inform the choice of this label for the concept

implications

Examples of usage

Scope: (description of the scope of application)

Glossary Template

Name: ToIP Governance Glossary

Scope: (description of the scope of application)

Language: en

Scope: (description of the scope and purpose for which the glossary is supposed to be used.)

Taglist: (any term that has a tag from this list will be included in this glossary)

Links to any discussions in github issues

Notes:

Space shortcuts

Page tree

Mission and Scope

Conveners (add your name if you are interested to become one of the convenors)

Interested Members (add your name if you may be interested in joining this proposed WG)

Description

Context

C&T WG Mission

C&T WG Purpose

Concepts, Terminology and Scopes

Contributions and Example Outputs

C&T WG Tasks

Governance

Working Group Charter

Requirements

The Corpus of Terminology MUST have:

The Corpus MUST NOT have:

The Corpus SHOULD be:

Solution Approaches

We SHOULD:

We MAY:

Content Templates

Concept Template

Term Template

Glossary Template