Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

TimeItemLeadNotes
1 minWelcome & Antitrust Policy NoticeChairs
2 minsIntroduction of new membersChairs
2 minAgenda review & open Action ItemsChairs
5 minsCo-Chair volunteersChairs
35 minsPresentation and discussion on tooling and workflow
12 minsIntegration with Operations Team
2 minsReview of Decisions and new Action Items Chairs
1 minNext meetingChairs

Recording

  • link to the fileLink

Presentation(s)

...

  1. Welcome and Linux Foundation antitrust policy
  2. Introduction of new members
  3. Agenda review & open Action ItemsItem 1
  4. Daniel Hardman presented his slides and recommendations about terminology tooling
    1. His evaluation of existing tooling (open source and commercial) is that nothing that is still maintained really fits our needs well
    2. We need "just enough tooling"
    3. Daniel proposes the following data pipeline
      1. Capture—receive raw data from on-off ticket or batch submission PR or script
      2. Scan—human sanity check to triage, catch basic issues
      3. Merge—commit to repo, convert to internal data model, assign permalinks, becomes publishable
      4. Mature—Run (semi-)automated QA. Generate tickets. Propose "Accepted" status for community = WG. Assign tickets for curators of other communities.
      5. Accept—Review and adjust ticket statuses in WG meeting.
    4. Daniel proposed a basic data model (see the slides)
      1. Rieks noted: "I'm concerned about the relation between concept and term having 1 - n multiplicities rather than n - m multiplicities. To be discussed."
    5. Daniel proposed a process by which every stakeholder community can review and decide on the status of a term without having to necessarily agree with other communities
    6. Daniel proposed two major requirements for our tooling
      1. Major feature #1: Manage Curation
        1. Anybody can propose content
        2. Tickets are the way to change content status
        3. Anybody can raise a ticket
        4. Review tickets are tied to a community (scope)
        5. Each community has its own status
        6. Each community has its own review process and appoints one or more "curators" <== term proposed by Daniel
        7. Curators directly update status for their community (or admins update per instructions from a curator)
        8. Enforce some data integrity rules and workflows
        9. Track contributors, history
        10. Stats
      2. Major Feature #2: Publish
        1. Emit content per community
        2. Timely updates (realtime desirable)
        3. Artifacts can be styled/customized per community 
        4. Static, searchable, indexable HTML
          • One doc, or one doc per term / concept
          • Stable relative links
        5. Programmable data (CSV or JSON) and/or API
          1. An example is writing a script to analyze a glossary or a group of terms
        6. Full metadata available
          1. Contribution history
          2. Status change history
    7. Data and permalinks
      1. Live data should be in the internal data model
        1. Browsable in internal data model
        2. https://github.com/<repo>/terms/agent-119 (term named by first EN label + concept num)
        3. https://github.com/<repo>/concepts/119-agent (concept named by num + first EN label)
        4. Hyperlinked to issues
        5. Links are stable across changes in terms, definition text <== permalinks are in place, so terms can be deprecated and still resolve
      2. Published data
        1. Browsable in glossary data model format
        2. Published by communities on sites under their control (they put static HTML where they want)
        3. <glossary website>/agent.html (no concept links)
        4. Links are versioned (not guaranteed stable across releases) <== not permalinks
      3. Dan asked if we could use GitHub Actions to publish "live data"
        1. Daniel said yes, that would result in the published data reflecting the live data
      4. Drummond asked about how a version of a glossary can be "frozen" for a specific community, i.e., a spec
        1. Daniel said that the community could fork off a version of their glossary
        2. You can also point off to a specific version of the data at any point in time.
    8. Specific tool proposals
      1. terminology database = github repo
        1. Ingest new data as Markdown documents in GitHub
        2. Then after processing into internal data model, still keep each "table" in as a Markdown document in GitHub
      2. to do QA for WG review of submitted data: new python script (Daniel volunteering but inviting others)
      3. to manage internal data model
        1. to convert from submit format to internal data model: new python script (Daniel volunteering but inviting others)
        2. to edit and browse internal data model: modified ESSIF / GRNet tool (the one Rieks has developed)
        3. to update status, add hyperlinks, propagate tags: new python script(s)
      4. to emit static HTML: github action hooked up to #2 in preceding bullet
      5. to emit programmable data (CSV, ...)—TBD
    9. Configuring a community
      1. Provide official name and #tag
      2. Identify and train curators (github handles, contact info)
      3. Configure artifacts
      4. Configure data import
      5. Train community on curation and publication processes
    10. Configuring an artifact
      1. Choose publication mechanism (output template, scripts, targets, collateral) 
      2. Setup schedule or triggers for publication
      3. Provide selection criteria (tags, statuses)
      4. Test run
    11. Configuring data import
      1. One-time, ad-hoc, ongoing?
      2. Write and/or tune script(s)
      3. Dry runs with cleanup
      4. First import
      5. Trigger for deltas
    12. Working Group Duties
      1. Triage tickets
      2. Train communities
      3. Setup communities and artifacts
      4. Liase with communities
      5. Approve "Accepted" status requests
      6. Propose new data sources
      7. Configure and maintain tool integrations
      8. Develop output templates
      9. Run tools for ingestion
      10. Review data quality
    13. Other proposals—for our actions
      1. Publish draft glossaries from our 3 datasets
        1. Data sanity check
        2. Convert to internal data model
        3. Configure artifacts and export
        4. Assign community curators to approve 
        5. Forcing function for tools: first cut by mid Dec?
      2. Designate 1 or more chairs for WG
        1. I will volunteer to be one
      3. Divvy up WG work for #1 (tickets)
      4. Figure out collaboration model outside WG meetings
      5. Modify agenda so we spend a chunk of our time working tickets
  5. Item 2
  6. Item 3
  7. Review of Decisions and Action Items
  8. Next meeting

...