What is Provenance:
You can look at provenance as the history of a data object, for example how a sample has been created and where it has been stored.
Provenance, as defined in the ISO Provenance Standard ISO 23494 –
“Biotechnology – Provenance information model for biological specimen and data – Part 2: Common Provenance Model” is expressed as a triplet of “Entity – Agent – Activity”. The entity was generated by an activity, which has been performed by an agent. Such a triplet is displayed in the image below.
The below image shows the provenance of a tissue sample. The tissue sample was derived from the entity “Patient” and was generated by the action of “Biomaterial collection”. The action “Biomaterial collection” used the entity “Patient” and was performed by (associated with) the agent “Surgical Department”
The Provenance Access Point
The Provenance Access Point (PAP) exposes provenance information about a single record, meaning a single data record like a biological sample. The information returned by the PAP adheres to the ISO provenance standard 23494 and is in a machine-readable format.
The PAP is still under development (GitHub). In the current state, data is extracted from OpenSpecimen with the OpenSpecimenAPIconnector (GitHub, Docs), transformed by a Juypter Notebook (GitHub) and loaded in a Neo4j using the PROV Database Connector (GitHub, Docs).
As a demonstration, you can access the provenance database here.
LogIn to the Neo4J database:
– Credentials Admin:
– Username: “neo4j”
– Password: “admin”
As a logged-in user to the provenance database, you can now click on the database icon and then click on the red number to display all stored entities, see image below.
As a result, you will see a graphical representation of the provenance information of a biological sample stored in OpenSpecimen. The graph shows the events of sample collection, reception and processing (a derivate is generated from the sample). You can click on each of the nodes to display further information.
This demo provenance database was filled with the data from a demo OpenSpecimen instance using the Jupyter Notebook here.
The script to fill the PAP is called fillPAP.ipynb and is located in at work/fillPAP.ipynb. The Jupyter access token is “FDP“.