Concepts & Terminology Working Group (Proposed)

Introduction

Contributors/users in ToIP come from various backgrounds. Their culture may not be Western; English may not be their native tongue. Yet most ToIP business will probably take place in ways that assume some of that shared context. Apart from this, ToIP community members are expected to contribute different expertise, and focus on different topic domains, e.g. technology, governance, legal, societal, etc., all of which comes with its own jargon. This makes highly precise, mutual understanding hard.

We expect to see situations of "language confusion", i.e. in which people use words or phrases, the intension (not: intention) of which differs from the interpretation of some listeners/readers. Sometimes a casual glance at a dictionary or glossary is the solution. In other cases, deeper understanding matters, e.g. in when drafting specifications or contracts. Then we need more than a set of definitions.

This WG aims helps ToIP community participants understand one another at whatever level of precision they need.

Discussion of the Proposed WG Charter

DanG: My Comments

While I agree with the problem that "Interpretation" of meaning is an pervasive problem in many of our daily lifestyle activities that cross international boundaries, I do not believe this is an achievable problem to solve for the Foundation nor one within our mission. That said, if there are community members desiring to contribute to such an endeavor I for one will not prevent such an activity.

Conversely, the Foundation requires base level of "words" that need to be "defined" in a default language "English". This if often referred to as a Glossary. Since the Foundation is not missioned to prescribe any technology solutions, we will need to have a base Glossary and then each ToIP Interoperability Profile (TIP) will need to extend the base.

If there is community interest in "interpreting" the meaning of these Glossaries, that for me is a separate task which BTW is dependent on the establishment oof something to "interpret".

Therefore, as per originally proposed – I refer to the original proposal as a starting point for a Glossary WG: ToIP Glossary WG proposal

I am convinced we continue to speak about two disparate efforts.

DanielH: My Comments
Dan Gisolfi You and I are two of the American voices that may not feel the urgency of this effort quite as intensely. We get to throw out words (or, more difficult, metaphors) from our own culture and they often stick by the willingness of the majority to go along... I speak Spanish fairly well, but if I were trying to do highly technical work in Spanish, I'm certain I would need a specialized glossary, and I'm certain it would feel like slow going to me. My confidence would decrease. Reading and writing would take me longer, and I'd second-guess myself more. I know you get that and have proposed the glossary effort, and you'd like to keep it simple and lightweight. I get it and align with that desire. But I think it's okay to have a deeper agenda too, as long as many of us can stay surface-level and get stuff done fast. I'd feel happiest about an approach that treats English as just another language (calling English a default kind of makes me cringe; I acknowledge the denotation but don't like the connotation). I'd like what we do to be equally capable of generating a glossary in Dutch or Russian or Chinese, for example. My suggestion is to define the success criteria as follows: 1) The group's going to make one or more glossaries, and for people who don't have much more interest than that, getting stuff done in or with the group can feel like simple glossary maintenance; 2) The group's going to approach the underlying data in a way that makes localization and formal terminology work easier, as long as it doesn't get in the way of rapid progress. I think this is quite doable, and it allows the two proposals to be harmonized. If we can't harmonize them, I think both efforts will starve for attention.

Scope Statement (for the JDF Working Group Charter)

The scope of the Concepts and Terminology Working Group is to develop a corpus of shared concepts and terminology available to all stakeholders in the Trust over IP stack. This includes developing artifacts and tools for discovering, documenting, defining, and (deeply) understanding the concepts and terms used within ToIP. Key deliverables include one or more glossaries, and the corpus of data behind them. The data will consist of formally modeled concepts, plus their relations and constraints, and will encompass perspectives from technical, governance, business, legal and other realms.

DanielH note:

The copy editor in me may have gotten carried away a bit by the simple paragraph I wrote just above. (I'm a novelist by hobby, and "less is more" has been drilled into me by professional editors at publishing houses.) Rieks' original paragraph contained more detail than what remains in my version. I tried to stay true to its intent while simplifying as much as I could. I'm preserving the longer form below for comparison, in case we need to add something that I cut too much, or in case we want to go back to that version and start over. Original version: The mission of the proposed C&T WG is to foster efficient and effective cooperation between ToIP members that each have their own backgrounds (socially, linguistically, expertise, etc.), by creating and maintaining artifacts and tools for specifying, documenting, learning, and (deeply) understanding the concepts and associated terms that are used within ToIP, and to eliminate terminological confusion where possible. The scope of the WG is the creation and maintenance of a corpus of terms and concepts, from which some basic artefacts will be generated and additional ones as the need arises. Other WG activities will include the creation of conceptual (mental) models, that see on the formal specification of concepts, their relations and constraints, such that they can serve as a solid (mental) basis for dealing with issues in the technical, governance, business, legal and other realms. This WG may also organise Task Forces for specialized activities if deemed appropriate by the majority of the WG members and in line with the overall mission of the ToIP Foundation.

Drummond note (2020-06-12):

I made minor wordsmithing changes to the Scope Statement.

Conveners (add your name if you are interested to become one of the conveners)

Rieks Joosten, TNO
Drummond Reed, Evernym

Interested Members (add your name and organization if you may be interested in joining this proposed WG)

Daniel Hardman
Oskar van Deventer
Scott Perry
Shashishekhar S
Philippe Page
Paul Knowles
Taylor Kendal
Scott Whitmire
Arjun Govind
Vinod Panicker
sankarshan
Steven Milstein
Joaquin Salvachua

Description

The primary focus of the ToIP Foundation is not just on technology (e.g. cryptography, DIDs, protocols, VCs, etc.), but also on governance and on business, legal and social aspects. Its mission to construct, maintain and improve a global, pervasive, scalable and interoperable infrastructure for the (international) exchange of verified and certified data is quite complex, and daunting". This not only requires technology to be provided (which is, or should be the same for everyone, i.e. an infrastructure). It also requires that different businesses with their different business models can use it for their specific, subjective purposes. And that each individual business and user is provided with capabilities that facilitate its compliance with the rules, regulations and (internal and external) policies that apply to that entity - the set of such rules, regulations and policies being different for every such entity, and dependent on the society, the legal jurisdictions and individual preferences. All this is to be realized by people and organizations from different backgrounds - different cultures, languages, expertise, jurisdictions etc., all of whom have their own mindset, objectives and interests that they would like to see served.

The aim of this WG is to enable people in the ToIP community to actually understand what someone else means, to the extent and (in-depth) precision that they need, and facilitating this by producing deliverables/results/products that are fit for the purposes that they pursue.

We expect such results to include a common glossary, that lists the basic words we use in the ToIP community and briefly explain/define them, using existing sources such as NIST, Sovrin, W3C's VC, DID standards, and others. We may be able to leverage the new 'glossary effort' that the W3C CCG has recently initiated. We also expect such results to include additional glossaries, that subgroups of the ToIP community (e.g. TIPs) create to serve their needs as they focus on specific objectives (thus facilitating domain/objective-specific jargon). We currently envisage 'technology stack' and 'governance stack' glossaries that serve the specific needs of the associated WGs. We leverage the ToIP Glossary WG proposal from Dan Gisolfi (IBM).

Also, we expect such results to include more precise (theoretical?) specifications of underlying concepts, e.g. in terms of conceptual/mental models. Such models help to obtain a more in-depth understanding of ideas that are worth and necessary to be shared within one or more community sub-groups. They may also facilitate the learning process that (new) community members go through as they try to understand what it is we're actually doing. And they may help to 'spread the word' in specifically targeted (e.g. business and legal) audiences. A specific focus of this WG is to establish relations between the concepts of the mental models and the terms defined in the various glossaries.

A model for some of the deliverables of this WG is one or more websites that would resemble the Legal Dictionary. This site not only provides a definition of various terms, but also a brief description of their backgrounds, various use-cases that exemplify the relevance of (and distinctions made by) the terms, and other useful information.

Finally, we expect to see results that we haven't thought of yet, the construction of which will be initiated as the need arises, by (representatives of) those that need such results for a specific purpose. Perhaps we might produce a method for resolving terminological discussions that can be lengthy and do not always get properly resolved (e.g. as in id-core issues #4, #122). Here, we leverage a prior collaboration between Daniel Hardman (Evernym) and Rieks Joosten (TNO).

Charter

Develop and maintain a high-quality corpus of terminology that covers the needs of the ToIP community.
Develop a process whereby this corpus can be:

Curated, based on evidence and using expert opinion, such that concepts, relations between concepts and constraints can e.g. be
1. carefully defined,
2. assigned an identifier (name/number/label) to distinguish it from any other concept in the corpus,
3. mapped onto terms that are defined and/or commonly accepted in various relevant domains/contexts,
4. their usage and relevance documented from organic sources,
5. their status adjudicated into e.g. 'working', 'preferred', 'accepted', 'superseded' and 'deprecated'.
Enhanced in a collaborative, open, and fair manner by interested community members.
Versioned.
Published in different ways (e.g. as a glossary, concept map, use-case stories ...), for specific purposes (e.g. education, reference, , ...) by different means (e.g. a PDF, a website, presentations/webinars, ...) and as needed by different audiences/stakeholders or domains (e.g. business domains, architectural domains, ...)
Promoted as a valuable public resource and an influence for convergence and excellence.

Train and organize volunteers so the initiative develops sustainable long-term momentum.
Disseminate/promote the work across ToIP WGs and other relevant audiences.

Requirements

The Corpus of Terminology MUST have:

Source control and build processes managed in github.
A well defined syntax for contributing concepts/relations, and for each of them an identifier by which it can be identified within the scope of the Corpus.
A well defined syntax for attributing terms to such (established) concepts/relations for specific contexts/domains.
A well defined CI/CD process that includes auto sorting of terms and concepts. (??? RJ: I'm not sure what this means.)
A simple process for contributing further content.
A simple publicly accessible website, containing at least the Corpus-identifiers and their definitions, possibly inspired by the 'Legal Dictionary'.
A PDF document for every published version, containing at least the Corpus-identifiers and their definitions.

The Corpus MUST NOT have:

A skill requirement on programming knowledge as that will reduce contributors.

The Corpus SHOULD be:

Reusable and easy to leverage in TIP repos.
Usable for language translation via separate self-organized language specific repos. These repos should be aggregators of the baseline glossary and any TIPs.
Usable for mapping its identifiers/terms to those in use in other contexts/domains.
Consumable at the RAW content level (.md files) by external groups who wish to render content in a different manner.

Solution Approaches

We SHOULD:

Use a github repo to manage the corpus.

Consider using a Creative Commons license instead of an Apache license; it may be more appropriate.
Require DCO/IPR for contributors to the repo. Anybody who complies with the DCO/IPR requirements can submit to the corpus by raising a PR.
No need to manually maintain metadata about who edited what, when. We have commit history and git praise/blame.
Use github issues to debate decisions about term statuses. Anybody can raise an issue.

Use existing pervasive opens source documentation tools such as mkdocs, Docusaurus, or GitHub Pages:

Each concept is described in a separate markdown doc that conforms to a simple template (see below). Concepts link to related concepts.
Each term is a separate markdown doc that conforms to a different simple template (see below again). Terms label concepts; links from concepts to terms remain implicit in the markdown version of the data, to avoid redundant editing. Having terms and concepts as separate documents that cross-link allows for synonyms, antonyms, preferred and deprecated and superseded labels for the same concept, localization, and so forth. They also allow for the peaceful co-existence of multiple terminologies (= sets of terms, namespaces, …)
Each context glossary is a separate markdown doc that conforms to another different simple template (see below once again). A glossary is an alphabetic list of terms relating to a specific subject, or for use in a specific domain, with explanations. The markdown document specifies the scope of the glossary, and the selection criteria for terms.
Provides extendable CI/CD pipeline for the repo, and write unit tests to enforce any process rules, quality checks, and best practices the WG adopts.
CI/DI process should enable live website and refreshed PDF document after each approved and merged PR.

Define the criteria for giving a term the statuses. What are grounds for saying it is deprecated, superseded, etc. (Criteria are published in a doc in the repo, so debating changes to criteria means a PR and github issue.)
Create a release process guidelines.

Define difference between live glossary and a “blessed version”. Suggest once per quarter, with names like “2019v1” (where 1 is a quarter). This format is not semver-compatible, because we have no need to wrestle issues of forward and backward compatibility--but it is easy to understand, parse, and reference in a URI.

Establish a ToIP website level access experience

Access to main Glossary in all language versions
Access to TIP Glossaries

We MAY:

Leverage existing CI/DI approaches (sample code repos) for incorporating mkdocs, Docusaurus, or GitHub Pages.
Suggest to the tech WG that they may write a generator tool that walks the repo, building in memory a semantic network of concepts that are cross-linked to terms, and emitting various incarnations of the content:

Browsable static html that’s copied to a website, glossary.decentralized.foundation. The website should be indexed by Google and have search based on elasticsearch.
A .zip file of the static html that could be copied to other web sites.
An ebook format (e.g., epub).
Possibly, occasionally, a JIT-printed SKU published on kdp.amazon.com.

Create a crawler process that collects terminology from various sources (contexts), for the purpose of mapping terminology as is used and/or defined in that context onto the concepts/relations in our Corpus
Create a process for pulling new content (terms, concepts) from the MM_WG

A source is declared in a config file that’s committed to the repo. This means anybody can propose a source by submitting a PR and debating its validity in a github issue.
Sources could include W3C Respec docs, IETF RFCs, Aries RFCs, DIDComm specs hosted at DIF, etc. Corporate websites wouldn’t work because A) they’re too partisan; B) they’d require random, browser-style web crawling, which is too hard to automate well.
Crawler pulls docs and scans them, looking for regexes that allow it to isolate term declarations, their associated definitions, and examples that demonstrate their usage.
Output from crawler is a set of candidate terms that must be either admitted to a pipeline, or rejected, by human judgment. Candidates that are already in the corpus are ignored, so this just helps us keep up to date with evolving term usage in our industry.

Content Templates

Concept Template (to be further developed on github)

Concept ID: 12345 (this is a 5-digit number that’s embedded in the filename, such as c-12345.md)

Criterion

en text: <text that allows the reader to evaluate whether or not something qualifies as an instance of the concept in every (yes, every) relevant use-case>

Definition

en text: blah blah blah

<other language code> text: lorem ipsum cu prorat

links to media (diagrams, audio, video)

Links to any discussions in github issues

Notes

history and theory of the concept in its larger mental model

implications

Related Concepts

Term Template (to be further developed on github)

Term: faster than light

Short form:

Acronym: FTL

Language: en

Labels concept: c-12345 (filename for this term would be t-12345.x.md, where 12345 comes from the concept, and x is 1-3 digits that uniquely identify the term in the context of its concept)

Links to any discussions in github issues

Notes

metaphors or mental/conceptual models (or namespaces) that inform the choice of this label for the concept

implications

Examples of usage

Scope: (description of the scope of application)

Glossary Template (to be further developed on github)

Name: ToIP Governance Glossary

Scope: (description of the scope of application)

Language: en

Scope: (description of the scope and purpose for which the glossary is supposed to be used.)

Taglist: (any term that has a tag from this list will be included in this glossary)

Links to any discussions in github issues

Notes:

Space shortcuts

Page tree

Introduction

Discussion of the Proposed WG Charter

DanG: My Comments

Scope Statement (for the JDF Working Group Charter)

Conveners (add your name if you are interested to become one of the conveners)

Interested Members (add your name and organization if you may be interested in joining this proposed WG)

Description

Charter

Requirements

The Corpus of Terminology MUST have:

The Corpus MUST NOT have:

The Corpus SHOULD be:

Solution Approaches

We SHOULD:

We MAY:

Content Templates

Concept Template (to be further developed on github)

Term Template (to be further developed on github)

Glossary Template (to be further developed on github)