Page History

...

Time	Item	Lead	Notes
1 min	Welcome & Antitrust Policy Notice	Chairs
2 mins	Introduction of new members	Chairs
2 min	Agenda review & open Action Items	Chairs
5 mins	Co-Chair volunteers	Chairs
35 mins	Presentation and discussion on tooling and workflow	Daniel Hardman
12 mins	Integration with Operations Team	David Luchuk
2 mins	Review of Decisions and new Action Items	Chairs
1 min	Next meeting	Chairs

Recording

link to the fileLink

Presentation(s)

Daniel's slides

...

Welcome and Linux Foundation antitrust policy
Introduction of new members
Agenda review & open Action Items
Daniel Hardman presented his slides and recommendations about terminology tooling
1. His evaluation of existing tooling (open source and commercial) is that nothing that is still maintained really fits our needs well
2. We need "just enough tooling"
3. Daniel proposes the following data pipeline
  1. Capture—receive raw data from on-off ticket or batch submission PR or script
  2. Scan—human sanity check to triage, catch basic issues
  3. Merge—commit to repo, convert to internal data model, assign permalinks, becomes publishable
  4. Mature—Run (semi-)automated QA. Generate tickets. Propose "Accepted" status for community = WG. Assign tickets for curators of other communities.
  5. Accept—Review and adjust ticket statuses in WG meeting.
4. Daniel proposed a basic data model (see the slides)
  1. Rieks noted: "I'm concerned about the relation between concept and term having 1 - n multiplicities rather than n - m multiplicities. To be discussed."
5. Daniel proposed a process by which every stakeholder community can review and decide on the status of a term without having to necessarily agree with other communities
6. Daniel proposed two major requirements for our tooling
  1. Major feature #1: Manage Curation
    1. Anybody can propose content
    2. Tickets are the way to change content status
    3. Anybody can raise a ticket
    4. Review tickets are tied to a community (scope)
    5. Each community has its own status
    6. Each community has its own review process and appoints one or more "curators" <== term proposed by Daniel
    7. Curators directly update status for their community (or admins update per instructions from a curator)
    8. Enforce some data integrity rules and workflows
    9. Track contributors, history
    10. Stats
  2. Major Feature #2: Publish
    1. Emit content per community
    2. Timely updates (realtime desirable)
    3. Artifacts can be styled/customized per community
    4. Static, searchable, indexable HTML
    5. Programmable data (CSV or JSON) and/or API
      1. An example is writing a script to analyze a glossary or a group of terms
    6. Full metadata available
      1. Contribution history
      2. Status change history
7. Data and permalinks
  1. Live data should be in the internal data model
    1. Browsable in internal data model
    2. https://github.com/<repo>/terms/agent-119 (term named by first EN label + concept num)
    3. https://github.com/<repo>/concepts/119-agent (concept named by num + first EN label)
    4. Hyperlinked to issues
    5. Links are stable across changes in terms, definition text <== permalinks are in place, so terms can be deprecated and still resolve
  2. Published data
    1. Browsable in glossary data model format
    2. Published by communities on sites under their control (they put static HTML where they want)
    3. <glossary website>/agent.html (no concept links)
    4. Links are versioned (not guaranteed stable across releases) <== not permalinks
  3. Dan asked if we could use GitHub Actions to publish "live data"
    1. Daniel said yes, that would result in the published data reflecting the live data
  4. Drummond asked about how a version of a glossary can be "frozen" for a specific community, i.e., a spec
    1. Daniel said that the community could fork off a version of their glossary
    2. You can also point off to a specific version of the data at any point in time.
8. Specific tool proposals
  1. terminology database = github repo
    1. Ingest new data as Markdown documents in GitHub
    2. Then after processing into internal data model, still keep each "table" in as a Markdown document in GitHub
  2. to do QA for WG review of submitted data: new python script (Daniel volunteering but inviting others)
  3. to manage internal data model
  4. to emit static HTML: github action hooked up to #2 in preceding bullet
  5. to emit programmable data (CSV, ...)—TBD
9. Configuring a community
  1. Provide official name and #tag
  2. Identify and train curators (github handles, contact info)
  3. Configure artifacts
  4. Configure data import
  5. Train community on curation and publication processes
10. Configuring an artifact
  1. Choose publication mechanism (output template, scripts, targets, collateral)
  2. Setup schedule or triggers for publication
  3. Provide selection criteria (tags, statuses)
  4. Test run
11. Configuring data import
  1. One-time, ad-hoc, ongoing?
  2. Write and/or tune script(s)
  3. Dry runs with cleanup
  4. First import
  5. Trigger for deltas
12. Working Group Duties
  1. Triage tickets
  2. Train communities
  3. Setup communities and artifacts
  4. Liase with communities
  5. Approve "Accepted" status requests
  6. Propose new data sources
  7. Configure and maintain tool integrations
  8. Develop output templates
  9. Run tools for ingestion
  10. Review data quality
13. Other proposals—for our actions
  1. Publish draft glossaries from our 3 datasets
  2. Designate 1 or more chairs for WG
  3. Divvy up WG work for #1 (tickets)
  4. Figure out collaboration model outside WG meetings
  5. Modify agenda so we spend a chunk of our time working tickets
Item 1
Item 2
Item 3
Review of Decisions and Action Items
Next meeting

...

Space shortcuts

Page tree

Versions Compared

Old Version 6

New Version 7

Key

Recording

Presentation(s)