Last date modified: 2025-Aug-07
hOCR Service (REST)
When a Contracts OCR Set is run, the Contracts OCR Agent creates Contracts Image Results objects for every page of every document in the Contracts OCR Set. Those Contracts Image Results objects contain hOCR data describing the content and structure of the text found in the document image via OCR.
The Contracts hOCR Service exposes an endpoint for reading hOCR data created by the Contracts OCR Agent.
Guidelines for the hOCR Service
The following general guidelines apply for working with this service.
URLs
The URLs for the hOCR Service's REST endpoints contain path parameters that you need to set before making a call:
- Set the versionNumber placeholder to the version of the REST API that you want to use, using the format of lowercase v followed by the version number (e.g. v1 or v2).
- Set the workspaceId and documentId path parameters to the Artifact ID of the given entity. For example, you'd set workspaceId to the Artifact ID of the Workspace.
For example, you can use the following URL to retrieve the hOCR data for a particular page of a document:
<host>/Relativity.Rest/API/contracts/{versionNumber}/ocr/{workspaceId}/document/{documentId}/page/{pageNumber}
Set the path parameters as follows:
- versionNumber—the version of the API, such as v1.
- workspaceId—the Artifact ID of the Workspace that contains the document.
- documentId—the Artifact ID of the document you want to retrieve hOCR data for.
- pageNumber—the page number for the document image page you want to retrieve hOCR data for. Page numbers begin at 1.
Client Code Example
To use the hOCR Service, send requests by making calls with the required HTTP methods.
You can download a .NET example project here.
Read hOCR Data
To get hOCR data for a document page, send a GET request with a URL in this format:
<host>/Relativity.Rest/API/contracts/{versionNumber}/ocr/{workspaceId}/document/{documentId}/page/{pageNumber}
Response field descriptions
- DocumentId—the Artifact ID of the document the hOCR data is associated with.
- PageNumber—the page number of the page in the document image that the hOCR data represents. Page numbers begin at 1.
- Text—an array of objects representing the terms found in the document image by Contracts OCR. Each object has the following fields:
- Confidence—the hOCR confidence rating (as a percentage) for the text. The value is always a non-negative integer between 0 and 100.
- Offset—the position of the term (in number of characters) from the start of the document.
- Length—the number of characters in the term's text, as outputted by the OCR engine.
- Text—a string containing the term's text, as outputted by the OCR engine.
- BoundingBox—an object representing a rectangular box that "bounds" the term in the document image. It's used to define the term's position and size in the document page image, and it has these fields:
- Left—the bounding box's distance (in pixels) from the left edge of the document page image.
- Top—the bounding box's distance (in pixels) from the top edge of the document page image.
- Width—the width (in pixels) of the bounding box.
- Height—the height (in pixels) of the bounding box.
Sample JSON response
{
"DocumentId": 1041438,
"PageNumber": 2,
"Text": [
{
"Confidence": 96,
"Offset": 2,
"Length": 7,
"Text": "Exhibit",
"BoundingBox": {
"Left": 1760,
"Top": 144,
"Width": 94,
"Height": 23
}
},
{
"Confidence": 96,
"Offset": 10,
"Length": 4,
"Text": "10.5",
"BoundingBox": {
"Left": 1865,
"Top": 144,
"Width": 54,
"Height": 23
}
},
{
"Confidence": 96,
"Offset": 17,
"Length": 7,
"Text": "Summary",
"BoundingBox": {
"Left": 521,
"Top": 219,
"Width": 131,
"Height": 29
}
},
{
"Confidence": 95,
"Offset": 25,
"Length": 2,
"Text": "of",
"BoundingBox": {
"Left": 661,
"Top": 219,
"Width": 21,
"Height": 23
}
},
{
"Confidence": 96,
"Offset": 28,
"Length": 6,
"Text": "Fiscal",
"BoundingBox": {
"Left": 684,
"Top": 219,
"Width": 89,
"Height": 23
}
},
{
"Confidence": 96,
"Offset": 35,
"Length": 4,
"Text": "2008",
"BoundingBox": {
"Left": 784,
"Top": 219,
"Width": 66,
"Height": 23
}
},
{
"Confidence": 96,
"Offset": 40,
"Length": 6,
"Text": "Target",
"BoundingBox": {
"Left": 861,
"Top": 220,
"Width": 89,
"Height": 28
}
},
{
"Confidence": 95,
"Offset": 47,
"Length": 10,
"Text": "Short-Term",
"BoundingBox": {
"Left": 959,
"Top": 219,
"Width": 160,
"Height": 23
}
},
{
"Confidence": 95,
"Offset": 58,
"Length": 9,
"Text": "Incentive",
"BoundingBox": {
"Left": 1126,
"Top": 219,
"Width": 121,
"Height": 23
}
},
{
"Confidence": 96,
"Offset": 68,
"Length": 11,
"Text": "Percentages",
"BoundingBox": {
"Left": 1256,
"Top": 220,
"Width": 165,
"Height": 28
}
},
{
"Confidence": 96,
"Offset": 80,
"Length": 3,
"Text": "for",
"BoundingBox": {
"Left": 1434,
"Top": 219,
"Width": 38,
"Height": 23
}
},
{
"Confidence": 96,
"Offset": 84,
"Length": 3,
"Text": "the",
"BoundingBox": {
"Left": 1482,
"Top": 220,
"Width": 40,
"Height": 22
}
},
{
"Confidence": 96,
"Offset": 88,
"Length": 5,
"Text": "Named",
"BoundingBox": {
"Left": 656,
"Top": 260,
"Width": 94,
"Height": 22
}
},
{
"Confidence": 96,
"Offset": 94,
"Length": 9,
"Text": "Executive",
"BoundingBox": {
"Left": 756,
"Top": 259,
"Width": 126,
"Height": 23
}
},
{
"Confidence": 96,
"Offset": 104,
"Length": 8,
"Text": "Officers",
"BoundingBox": {
"Left": 891,
"Top": 259,
"Width": 110,
"Height": 23
}
},
{
"Confidence": 96,
"Offset": 113,
"Length": 2,
"Text": "of",
"BoundingBox": {
"Left": 1014,
"Top": 259,
"Width": 27,
"Height": 23
}
},
{
"Confidence": 96,
"Offset": 116,
"Length": 6,
"Text": "Lennox",
"BoundingBox": {
"Left": 1046,
"Top": 260,
"Width": 99,
"Height": 22
}
},
{
"Confidence": 96,
"Offset": 123,
"Length": 13,
"Text": "International",
"BoundingBox": {
"Left": 1154,
"Top": 259,
"Width": 174,
"Height": 23
}
},
{
"Confidence": 96,
"Offset": 137,
"Length": 4,
"Text": "Inc.",
"BoundingBox": {
"Left": 1339,
"Top": 260,
"Width": 48,
"Height": 22
}
}
]
}
Adding or Updating hOCR Data
To add or update hOCR data for a specific document page, send a POST request with a URL in this format with the below body:
<host>/Relativity.Rest/API/contracts/{versionNumber}/ocr/{workspaceId}/document/{documentId}/page/{pageNumber}
Sample JSON body
{
"hocr": "<insert hOCR here>",
}
Set the path parameters as follows:
- versionNumber—the version of the API, such as v1.
- workspaceId—the Artifact ID of the Workspace that contains the document.
- documentId—the Artifact ID of the document you want to retrieve hOCR data for.
- pageNumber—the page number for the document image page you want to retrieve hOCR data for. Page numbers begin at 1.
- hocr—the hOCR for the specified document page.
Sample JSON response
{
"DocumentId": 1041438,
"PageNumber": 1,
"Text": [
{
"Confidence": 96,
"Offset": 0,
"Length": 4,
"Text": "Bike",
"BoundingBox": {
"Left": 1760,
"Top": 144,
"Width": 50,
"Height": 20
}
}
}