Request for Comments: a repository for formal contexts - fca-list - lists.cs.uni-kassel.de

23 Feb 2024

Dear friends of FCA,
This mail got longer than anticipated. In short: I propose to create a
repository of formal contexts with machine-readable metadata. You can
find a initial draft here: https://github.com/fcatools/contexts
The long story: Sometimes it would be helpful to have exemplary formal
contexts readily available, for example, for illustrational purposes,
for testing algorithms, for benchmarking, or for beginners to explore
and learn FCA. However, I have not found a comprehensive repository of
example formal contexts. I know that Uta Priss has some classic
contexts on her web page (https://upriss.github.io/fca/examples.html)
and some FCA tools have contexts for unit tests (e.g.,
https://github.com/tomhanika/conexp-clj/tree/dev/testing-data) but
these are neither comprehensive nor easy to find, they have no
machine-readable metadata, they are not integrated into FCA tools or
libraries, and they are sometimes difficult to cite.
What I would like to have for FCA is what popular data science
libraries provide. For example, scikit-learn
(https://scikit-learn.org/) has some basic datasets
(https://scikit-learn.org/stable/datasets/toy_dataset.html) included
which can be accessed with just one line of Python code:
     iris = datasets.load_iris()
Similarly, Seaborn's load_dataset() method loads datasets from a git
repository (https://github.com/mwaskom/seaborn-data).
As there are many frequently-used exemplary formal contexts, I suggest
to create a git-based repository which contains such contexts together
with machine-readable metadata that describes them. I'd like to follow
the KISS principle and not over-engineer the whole thing, that is,
1. Each context is just a file in a git repository (suitable file
     formats are open for discussion, IMHO at least CTX).
2. The metadata for each context is described in file that is
     machine-readable and human-editable. My impression is that a
     stripped-down version of YAML would be sufficient (that is, just
     hierarchical key-value pairs).
An initial draft of such a repository can be found here:
https://github.com/fcatools/contexts
Using Git(Hub) has some benefits, for example, version control, a
workflow for collaboration and contributions (forks, pull requests), a
continous integration pipeline for the automatic generation of
derivatives (e.g., human-readable documentation, statistics, lattice
diagrams), simple programmatic access using HTTP, etc. (I am aware of
research data repositories but I think they are not the best choice
for what I have in mind. Still, snapshots of the git repo could
regularly be published, e.g., on Zenodo, which supports GitHub.)
The repository could easily be integrated into FCA workflows, tools,
and libraries and could simplify the (re)use of FCA (data).
Specifically, I'd like to support maintainers of FCA tools and
libraries to integrate access to the repository such that getting a
context is as simple as it is with other data in scikit-learn or
Seaborn. With a bit more time and effort more would be possible, for
example, a browseable repository of contexts like http://konect.cc/
provides for (social) networks.
Next steps towards the abovementioned goals would be:
1. gather feedback from the community
2. develop a curation policy and metadata schema
2. collect contexts and metadata
4. reach out to authors of FCA tools and libraries
I'd be very glad to get the discussion started and read your comments.
Best regards,
Robert Jäschke
--
Prof. Dr. Robert Jäschke              Humboldt-Universität zu Berlin
< https://hu.berlin/RJ
>>>>>>>>>>><<<<<<<<<<<
+49 (0)30 2093-70960 >
< https://weltliteratur.net/ >>>>><<<<<
https://dev.bibsonomy.org/ >