Page History

...

A source is declared in a config file that’s committed to the repo. This means anybody can propose a source by submitting a PR and debating its validity in a github issue.
Sources could include W3C Respec docs, IETF RFCs, Aries RFCs, DIDComm specs hosted at DIF, etc. Corporate websites wouldn’t work because A) they’re too partisan; B) they’d require random, browser-style web crawling, which is too hard to automate well.
Crawler pulls docs and scans them, looking for regexes that allow it to isolate term declarations, their associated definitions, and examples that demonstrate their usage.
Output from crawler is a set of candidate terms that must be either admitted to a pipeline, or rejected, by human judgment. Candidates that are already in the corpus are ignored, so this just helps us keep up to date with evolving term usage in our industry.

Content Templates: Archived

Space shortcuts