The Curation object
Pycaprio uses the Curation
object to model INCEpTION's documents, and has the following properties:
project_id
: Id of the project in which the curated document is located (integer).document_id
: Id of the curated document (integer).document_state
: State in which the document is in (string, possible values inpycaprio.core.mappings.DocumentStatus
).timestamp
: Curation's creation date.
List curated documents
Lists the curated documents in an INCEpTION project that have a document state specified with document_state
.
You can provide a Project
instance instead of a project_id
as well.
You can provide a Document
instance instead of a document_id
as well.
Example:
documents = client.api.curations(1, document_state = DocumentState.CURATION_IN_PROGRESS) # Finished curations in project #1
print(documents) # [<Document #4: file.xmi (Project: 1)>]
Download curated annotations
Downloads a curated document's annotation content.
You can specify the curation's format via curation_format
(defaults to webanno
).
You can provide a Project
instance instead of a project_id
as well.
You can provide a Document
instance instead of a document_id
as well.
Example:
from pycaprio.mappings import InceptionFormat
# In case you want a specific curated document
curated annotation_content = client.api.curation(1, 4, curation_format=InceptionFormat.WEBANNO) # Downloads test-user's annotations from document 4 of project 1
with open("downloaded_annotation", 'wb') as annotation_file:
annotation_file.write(annotation_content)
or for many files...
WEBANNO example:
# To download all curated documents, in case not all document have been curated (will cause error), you need to select the ones that have a document_state associated with curation:
from pycaprio.core.mappings import InceptionFormat, DocumentState
documents = client.api.documents(1)
for document in documents:
if document.document_state == DocumentState.CURATION_IN_PROGRESS:
curated content = client.api.curation(1, document, curation_format=InceptionFormat.WEBANNO)
with open(document.document_name, 'wb') as annotation_file:
annotation_file.write(curated_content)
XMI example:
# To download all curated documents, in case not all document have been curated (will cause error), you need to select the ones that have a document_state associated with curation:
from pycaprio.core.mappings import InceptionFormat, DocumentState
curations = []
documents = client.api.documents(1)
for document in documents:
if document.document_state == DocumentState.CURATION_IN_PROGRESS:
curated content = client.api.curation(1, document, curation_format=InceptionFormat.XMI)
curations.append(curated content)
for curation in curations:
z = zipfile.ZipFile(io.BytesIO(curation))
z.extractall('/your/path/')
Upload curation
Uploads a curated document in INCEpTION. It requires the Id of the project, the Id of the document, the annotator's username and the annotation's content (io stream).
You can specify the curation's format via curation_format
(defaults to webanno
) and its state via annotation_state
(defaults to NEW
).
You can specify the document's state via document_state
.
You can provide a Project
instance instead of a project_id
as well.
You can provide a Document
instance instead of a document_id
as well.
You need to specify content
which depends on the annotation format specified in the download:
To curate a document outside of INCEpTION or to simply change the status of a document into a curator status, you could do the following:
from pycaprio.mappings import InceptionFormat
# Get the annotations or a specific document as e.g. binary CAS
file = client.api.annotation(1, 4, 'test-user', curation_format=InceptionFormat.BIN)
# The below function then uploads the file with the new status
client.api.create_curation(1, 4, curation_format = InceptionFormat.BIN, content = annotations, document_state = DocumentState.CURATION_IN_PROGRESS)
XMI format also works, but one has to unzip the file first and import only the plain XMI file
from pycaprio.mappings import InceptionFormat
annotation_content = client.api.annotation(1, 4, 'test-user', curation_format=InceptionFormat.XMI)
z = zipfile.ZipFile(io.BytesIO(annotations))
z.extractall('/path/to/folder')
with open('/path/to/folder/file.xmi', 'rb') as f:
file = f.read()
# The below function then uploads the file with the new status
client.api.create_curation(1, 4, curation_format = InceptionFormat.XMI, content = file, document_state = DocumentState.CURATION_IN_PROGRESS)
Delete curations
Deletes curated annotations from a document and puts it back to 'ANNOTATION-IN-PROGRESS'.
You can provide a Project
instance instead of a project_id
as well.
You can provide a Document
instance instead of a document_id
as well.
Example:
client.api.delete_curation(1,4) # Deletes curated document #4 from project #1