Deposit, Ingest and Submission
Meeting Participants
- Nigel Ward
- Nick Nicholas
- David Levy
- Judith Pearce
- Matthew Walker
- Monica Berko
- David Pearson
Agenda
- Discuss with Judith about a clarification of the terms “Deposit”, “Ingest” and “Submission”
Deposit workflow
- Business workflow rather than a service
- Initiated by creator/author
- Legislative
- Other Obligation
- Preservation
- broader access
- Is a business term, as distinguished from publishing or workflow activities
To be filled in/checked, examples of deposit in the real world:
- Deposit may be the overall combination of use cases which describe what a preservation agency, like the National Library, perform on digital objects.
Digitisation workflow
- creating a derivative work
- digitisation engine may have special relationship with repository
Publishing workflow
- actor is publishing system
- driver is to publish
- driver is to move to a trusted repository
Ingest
- What a repository does when it accepts digital objects
- May have many actions which are performed “on ingest”:
- Validating checksums
- Validating data
- Unpacking information from METS content packages
- XSLT transforms
- As an end result of Ingest, a document is stored in a repository
Real world examples:
- The APSR RIFF Submission service’s “tasks” which are performed when a document is submitted into the system
- The preservation process the National Archives’ XENA software performs on digital objects as they arrive in system
- The tasks performed by a Fedora repository when an object arrives resulting in the document being saved in the database.
Submission
- More a “Use Case”
- Judith suggested we use the term Submit instead to get a better match with the OAIS terms and the Submit Use Case
Real world example: OAIS terms and the Submit Use Case
Disseminate/Deliver
- what goes on at the other end
Archiving
- captures something already published
Difference Between Library and Research Focus
In the institutional model, deposit has been the focus.
- In the research field, a researcher wants to share their work
- In library terms, deposit is related to an obligation to preserve the work
Deposit And Publishing
- Different workflows depending on different use cases
- Reasons for deposit are increased access
- Driver for publishing and is about benefit which is usually considered to be short term
- In publishing, once the benefit has disappeared (audience too small to justify cost), preservation ends.
- Deposit continues despite direct attributable business interest [Editor’s Note: It could be argued that research institutions receive indirect benefits and that a library
benefits from preservation in achieving their goals]
Real World Examples
- A real world publishing house of material
- A news aggregator website which publishes stories from other sites but has no interest in long term preservation (or may not have a license to keep stories for long)
Harvest and Deposit
They are both the same in their end result (a digital object in a respository), but their “trigger” or “actor” is different. A Harvest is a “pull” whereas a Deposit is a
“push”.
Submission
- Only knows information about document is has been given
- Doesn’t know the level of preservation going into a system
- PREMISS stores preservation level
- Intention (how much effort I will expend supporting)
- Do I support document at all
- Only known upon combination
What goes into submission information package
- Persistent identifier
Intent
- differentiate between a submission & dissemination package
- submission can only know about the object, not nec. the policies that the repository can enforce, but these might still be expressed, e.g. preservation intent
NLA lessons learned from METS profiling activity
- repositories may know about objects that have not yet been ingested
- tightly coupled submit and ingest, but should not have
- e.g. business rules based on PID, had own schemas and called DB directly
- now: have SIP and express relationships in the package
Same thing at delivery end:
- e.g. PID with suffix 1A means it is in a tile
National Library Digital Management System Case Study
When NLA built digital management system, submission and ingest were too tightly coupled:
- Persistent identifier had meaning between systems
- Broken service contract; delivery end made decisions based on information extracted from persistent identifiers – this information was not part of the service contract and meant the system could not be reused. (e.g. PID with suffix 1A means it is in a tile)
- Now NLA is using Service Orientated Architecture and moving to “packages”
See diagram from METS profile from APSR project
- deals with input & output from repositories
- need an OpenURL profile for METS, capturing e.g. dissemination options
- c.f. FRED Obtain Genre
National Library watching OAI-ORE & WOCK as an alternative to METS content packaging
Canadian concept: virtual loading dock
- Architecture uses same ingest point irrespective of content type
- Special treatment of content type occurs behind the service contract and single entry point
