Extract/Transform Document Structure API
This API was added in Collabora Online version 24.04.5.3
Collabora Online allows you to extract the structure of a Writer or an Impress document, and transform it. The structure is extracted as a JSON document for easy parsing with JavaScript and other languages. The transformation is done following a recipe that is also uploaded as a JSON document.
Currently Collabora Online supports those features only in:
Writer documents:
Content controls
Charts
Document properties
Impress documents (presentations):
Slides
Important
Some of the JSON data is actually not valid JSON. This will be fixed soon.
Data types
Further on we’ll use the following conventions to specify the type of values and data.
<num>
An integer number value.<string>
A string value. When it is the value in JSON it is surrounded by quotes"
.<text>
A text value.<boolean>
A boolean value. The literals aretrue
andfalse
.[<TYPE>,]
An array of <TYPE> of any length. Can be any of the known data type.[<TYPE>,<TYPE>]
An fixed size array of <TYPE> value. Example:[<num>,<num>]
for an array of two integer.<value>
A JSON value of any type.NONE
There is no value. To satisfy JSON syntax substitute with""
.|
Indicate alternative when several types are possible.
Not all types are valid in every situation, but when used as a JSON value, the proper JSON syntax should be used: strings and text are quoted, arrays are JSON arrays, etc.
General Usage
This API is accessed through HTTP POST
, where the document, and the eventual transform are sent to the Collabora Online server and the result is returned. The examples here are shown using both curl
as a simple tool to interact with the server, and as a web form. Instead of curl
, any HTTP client can be used.
No specific configuration is necessary.
General Extraction
The extraction API work simply by sending the document, and setting an optional filter specifying the data you are interested in. In return you receive a JSON document decribing the document.
API: HTTP POST to
/cool/extract-document-structure
Example:
curl -k -F "data=@test.docx" -F "filter=contentcontrol" https://localhost:9980/cool/extract-document-structure > out.json
Important
The -k
in this example disables certificate validation as it is unlikely you have one in that situation. On a production system, please make sure you have valid certificates and not disabled the validation with -k
.
Or in HTML:
1<form action="https://localhost:9980/cool/extract-document-structure" enctype="multipart/form-data" method="post">
2 File: <input type="file" name="data"><br/>
3 Filter: <input type="text" name="filter" value="contentcontrol"><br/>
4 <input type="submit" value="Extract">
5</form>
Query parameter |
Description |
---|---|
|
The document file itself in the payload. |
|
The filter parameter (optional, recommended, see note) sets what to extract. Currently supported filter values are:
|
|
The language parameter (optional) sets the default
format language, useful for date type cells. If passed,
the load language is used and it determines the
display/output format. Example: |
Attention
Without setting filter
, the API will extract everything. Avoid using it without filters. As the API is expanded, the extracted data will be expanded too, that may cause unexpected problems.
Items
The extracted document content JSON is structured in the following way. The top-level object contains a DocStructure
object. It itself contains a bunch of objects, the items.
{
"DocStructure": {
"Charts.ByEmbedIndex.0": {
"..."
},
"ContentControls.ByIndex.0": {
"..."
},
"DocumentProperties": {
"..."
}
}
}
Each item is addressed by a selector.
Item selectors are used to represent an item in the document structure. They are specified by a string composed of up to three dot separated components.
The type of the object. Current possible values are
Charts
,ContentControls
andDocumentProperties
. Document properties are a singleton set so there is no need to address it.The addressing method.
ByIndex
,ById
,ByAlias
,ByTag
are valid forContentControls
,ByEmbedIndex
,ByEmbedName
,ByTitle
,BySubTitle
as valid forCharts
.The “index”, the parameter to the addressing. It is a string or a number depending on the method.
Examples:
Charts.ByTitle.Untitled Chart
Charts.ByEmbedIndex.2
ContentControls.ByTag.machine-readable
ContentControls.ByIndex.4
Each item is described by properties. The property names and their values are described in the corresponding section. They can be numbers, strings of text, arrays, or just another object. Transform involve changing these properties.
General Transform
Much like the extract API, the transform API work simply by sending the document, and in addition, a transform reciepe. The result is a new document.
API: HTTP POST to
/cool/transform-document-structure/<format>&<lang=xx-XX>
format
or lang
parameters can be ommitted
API: HTTP POST to
/cool/transform-document-structure
Tip
You can provide format as another parameter. If no format defined, it will use the input file format.
Example:
curl -v -k -F "data=@test.docx" -F "format=docx" -F "transform=$(cat transform.JSON)" https://localhost:9980/cool/transform-document-structure > out.docx
curl -k -F "data=@test.docx" -F "format=odt" -F "transform={\"Transforms\":{\"ContentControls.ByIndex.1\":{\"content\":\"Short text\"},\"ContentControls.ByIndex.6\":{\"content\":\"5/14/2024\",\"alias\":\"date\"}}}}" https://localhost:9980/cool/transform-document-structure > out.odt
Or in HTML:
1<form action="https://localhost:9980/cool/transform-document-structure" enctype="multipart/form-data" method="post">
2 File: <input type="file" name="data"><br/>
3 Format: <input type="text" name="format"><br/>
4 Transform: <input type="text" name="transform"><br/>
5 <input type="submit" value="Transform">
6</form>
Query parameter |
Description |
---|---|
|
The document file itself in the payload. |
|
The format parameter (optional) is the format of the
output document file. e.g. |
|
The transform parameter (required) a JSON formatted string that contains the transformation commands. |
|
The language parameter (optional) sets the default
format language, useful for date type cells. If passed,
the load language is used and it determines the
display/output format. Example: |
Commands
The JSON structure for transformations is close the extracted document structure. The top-level object contains a Transforms
object which itself contains a bunch of objects, the commands to transform.
Each command addresses a items by a selector and contains a bunch of properties describing the actions to take on the items. The transform commands are executed in the order they are listed in the transform JSON. An item selector (for example Charts.ByTitle.CommonName
) may match more than one item, and thus may transform more that one item. In that case the items will be handled one after the other, based on an item order (ByIndex
, ByEmbedIndex
, etc.). Once all of the matching items are processed, the next command is run, and so on. This behaviour can be useful in many cases, but be aware with complex cases and its side effects. Commands may overwrite, not only the previous commands results, but alter future commands.
Content Controls
Content controls, are items in document that have special UI optimized for user input, primarily for form filling purposes. (e.g. date-picker
, checkbox
, combo-box
, … )
This API can extract or overwrite the values of these content control items.
Extract
Use filter=contentcontrol
to extract the content controls.
Example output (pretty printed):
{
"DocStructure": {
"ContentControls.ByIndex.0": {
"id": -428815899,
"tag": "machine-readable",
"alias": "Human Readable",
"content": "some text",
"type": "plain-text"
},
"ContentControls.ByIndex.1": {
"id": -325123259,
"tag": "",
"alias": "cb1",
"content": "☒",
"type": "checkbox",
"Checked": "true"
},
"ContentControls.ByIndex.2": {
"id": -1652436866,
"tag": "",
"alias": "",
"content": "text1",
"type": "drop-down-list"
},
"ContentControls.ByIndex.3": {
"id": 659509202,
"tag": "",
"alias": "date",
"content": "7\/7\/2024",
"type": "date",
"DateFormat": "M\/d\/yyyy",
"DateLanguage": "en-US",
"CurrentDate": "2024-07-07T00:00:00Z"
}
}
}
Each control is defined by a bunch of properties. id
, tag
, alias
, content
, type
a present for every control:
Property |
Description |
---|---|
|
ID number of content controls, it may not unique. |
|
Users may (or may not) set a short alias text to identify the content control. |
|
Longer text that may (or may not) be generated by some software, or set by users. |
|
Its content as string |
|
Type of the content control. See table below. |
|
Only in case of a checkbox, its state, “true” or “false” |
|
Only in case of a date, format of the date like “M/d/yyyy” |
|
Only in case of a date, language of the date like “en-US” |
|
Only in case of a date, ISO 8601 formatted date value, independent from language. |
Content control types:
Type |
Content properties |
---|---|
|
|
|
The text |
|
The checked and content properties are in sync |
|
|
|
|
|
|
|
not supported yet |
Transform
Example Transform:
{
"Transforms": {
"ContentControls.ByIndex.1": {
"content":"Short text"
},
"ContentControls.ByTag.datetag": {
"content":"5/14/2024",
"date":"2024-05-14",
"alias":"date"
}
}
}
To select which content controls you want to transform, you can use the following selectors:
Selector |
Description |
---|---|
|
Index of the contentcontrol counted from 0. It is always unique. |
|
ID number of content control, it may not be unique |
|
Users may (or may not) set a short text alias to identify the content control. |
|
Longer text that may (or may not) be generated by some software, or set by users. |
Note
Most of the selectors may identify more than one content control. In that case, transform will change each of the selected content controls.
To transform a content control you can set the following properties:
Property |
Value |
---|---|
|
<text> |
|
<boolean> |
|
<string> |
|
<string> Only for date content controls. The date value. Does not change the displayed text. Always use ISO 8601 format like “2024-03-22”. |
content
is a <string>, it can be used to set data of any type of content control. “picture” controls are not yet supported.
In case of a checkbox
:
Setting
content
to “☐” will also set thechecked
to false.Setting
checked
to true will also setcontent
to “☒”
And vice-versa.
In case of a date
:
Setting
content
only sets the displayed text like “1999.dec.12” the actual date value will not be changed.Setting
date
only sets date value but does not change the displayed text.
In case of a rich-text
only unformatted text can be set like if it was plain-text
.
Transform will make the changes in content controls, in the same order as ContentControls
listed in the transform JSON string.
In case of the example transformation:
tag
is equal to datetag
and sets their value to “5/14/2024”.Note
If you set content
to an empty string, it may change back to Placeholder
value. (e.g. “Click here to enter text” or “Choose a date” or “Choose an item”…)
Screenshot
Example Files
Command for transform:
curl -v -k -F "data=@contentControlsExampleOriginal.odt" -F "transform=$(cat contentControlsTransform.JSON)" https://localhost:9980/cool/transform-document-structure > contentControlsResult.odt
Command for extract:
curl -k -F "data=@contentControlsExampleOriginal.odt" -F "filter=contentcontrol" https://localhost:9980/cool/extract-document-structure > contentControlsOriginalExtract.JSON
curl -k -F "data=@contentControlsExampleResult.odt" -F "filter=contentcontrol" https://localhost:9980/cool/extract-document-structure > contentControlsResultExtract.JSON
Extracted JSON from result odt
Extracted JSON from result odt, Pretty printed
Charts
Extract
Use filter=charts
to extract the charts.
Example output: (it is pretty printed here):
{
"DocStructure": {
"Charts.ByEmbedIndex.0": {
"name": "Object1",
"title": "Paid leave days",
"subtitle": "Subtitle2",
"RowDescriptions": [ "James", "Mary", "Patricia", "David"],
"ColumnDescriptions": [ "2022", "2023"],
"DataValues": [
"Row.0": [ "22", "24"],
"Row.1": [ "18", "16"],
"Row.2": [ "32", "32"],
"Row.3": [ "25", "23"]
]
}
}
}
Data it extracts:
Property |
Description |
---|---|
|
Name of the embedded object of the chart, can be used as a filter in transform to select the needed chart. |
|
The title of the chart, as a simple string. |
|
The Subtitle of the chart, as a simple string. |
|
Array of strings, containing the descriptions of the rows. |
|
Array of strings, containing the descriptions of the columns. |
|
Matrix of numbers, containing every cells data. |
Note
Some of the data values can be “NaN”, this means they are not set.
Transform
Example Transform:
{
"Transforms": {
"Charts.ByEmbedIndex.0": {
"modifyrow.1": [ 19, 15 ],
"datayx.3.1": 37,
"deleterow.0": "",
"insertrow.0": [ 15, 17 ],
"setrowdesc.0": "Paul",
"insertcolumn.1": [ 1,2,3,4,5,6 ],
"setcolumndesc.0": "c0",
"deletecolumn.3": ""
},
"Charts.ByEmbedName.Object3": {
"resize": [ 3, 3 ],
"setrowdesc": [ "a", "b", "c"]
}
"Charts.ByTitle.Fixed issues": {
"data": [ [ 3,1 ],
[ 2,0,1 ],
[ 3 ] ],
"setrowdesc": ["2023.01",".02",".03"],
"setcolumndesc": ["Jennifer", "Charles", "Thomas"]
}
}
}
To select which chart you want to transform, you can use these selectors:
Selector |
Description |
---|---|
|
Index of the embedded object counted from 0. It is always unique, but the index may reference embed other than charts. |
|
The unique name of the embedded object of the chart. |
|
Title of the chart. (Title is optional) |
|
Subtitle of the chart. (Subtitle is optional) |
To transform a chart you can use these commands:
Command |
Value |
Description |
---|---|---|
|
NONE |
Delete the <num> column |
|
NONE |
Delete the <num> row |
|
[<num>,] |
Set the column <num> data to the values |
|
[<num>,] |
Set the row <num> data to the values |
|
NONE |
Insert an empty column before column <num> |
|
[<num>,] |
Insert a column before column <num>, with values |
|
NONE |
Insert an empty row before row <num> |
|
[<num>,] |
Insert a row before row <number>, with values |
|
<text> |
Set column <num> description to <text> |
|
[<text>,] |
Set the column description to the values from the first. |
|
<text> |
Set the row <num> description to <text> |
|
[<text>,] |
Set the row description to the values from the first. |
|
[<num>,<num>] |
Resize data table <num> row and <num> column. Both numbers are required, and must be greater then 1. |
|
<num> |
Set the value of the cell row <num> and column <num> to the specified value. |
|
[[<num>,],] |
Set values of the data table to the values. The table size will grow as needed. |
Note
Commands that needs an array of values can be used with less values than the destination array. In that case it will only change the provided elements and leave the remaining one untouched.
Screenshot
Example Files
Command for transform:
curl -v -k -F "data=@docStructureChartExampleOriginal.odt" -F "transform=$(cat ChartsTransform.JSON)" https://localhost:9980/cool/transform-document-structure > docStructureChartResult.odt
Command for extract:
curl -k -F "data=@docStructureChartExampleOriginal.odt" -F "filter=charts" https://localhost:9980/cool/extract-document-structure > ChartsExtractOriginal.JSON
Document Properties
You can extract and modify document properties. These properties include meta data and statistics. You can add arbitrary named meta data properties. Most statistics are recalculated when the document is opened and can’t be modified.
Extract
Use filter=docprops
to extract the document properties.
Example output: (it is pretty printed here):
{
"DocStructure": {
"DocumentProperties": {
"Author": "Author TxT",
"Generator": "Generator TxT",
"CreationDate": "2024-01-21T14:45:00",
"Title": "Title TxT",
"Subject": "Subject TxT",
"Description": "Description TxT",
"Keywords": [ ],
"Language": "en-GB",
"ModifiedBy": "ModifiedBy TxT",
"ModificationDate": "2024-05-23T10:05:50.159530766",
"PrintedBy": "PrintedBy TxT",
"PrintDate": "0000-00-00T00:00:00",
"TemplateName": "TemplateName TxT",
"TemplateURL": "TemplateURL TxT",
"TemplateDate": "0000-00-00T00:00:00",
"AutoloadURL": "",
"AutoloadSecs": 0,
"DefaultTarget": "DefaultTarget TxT",
"DocumentStatistics": {
"PageCount": 300,
"TableCount": 60,
"ImageCount": 10,
"ObjectCount": 0,
"ParagraphCount": 2880,
"WordCount": 78680,
"CharacterCount": 485920,
"NonWhitespaceCharacterCount": 411520
},
"EditingCycles": 12,
"EditingDuration": 12345,
"Contributor": [ "Contributor1 TxT", "Contributor2 TXT"],
"Coverage": "Coverage TxT",
"Identifier": "Identifier TxT",
"Publisher": [ "Publisher TxT", "Publisher2 TXT"],
"Relation": [ "Relation TxT", "Relation2 TXT"],
"Rights": "Rights TxT",
"Source": "Source TxT",
"Type": "Type TxT",
"UserDefinedProperties": {
"NewPropName Bool": {
"type": "boolean",
"value": true
},
"NewPropName Numb": {
"type": "long",
"value": 1245
},
"NewPropName Str": {
"type": "string",
"value": "this is a string"
},
"NewPropName float": {
"type": "float",
"value": 12.45
}
}
}
}
}
The following properties are extracted:
Property |
Description |
---|---|
|
The user name who saved the file first time. |
|
Identifies which application was used to create or last modify the document. |
|
The date and time when file was first saved. |
|
Title of the document. |
|
Subject of the document. Can be used to group documents with similar contents. |
|
Comments to help identify the document. |
|
|
|
the default language of the document. |
|
The user name when the file was last saved in a LibreOffice file format. |
|
The date and time when the file was last saved in a LibreOffice file format. |
|
The user name who printed the file last time. |
|
The date and time when the file was last printed. |
|
The template that was used to create the file. |
|
The URL of the template from which the document was created. The value is an empty string if the document was not created from a template or if it was detached from the template. |
|
The date and time of when the document was created or updated from the template. |
|
The URL to load automatically at a specified time after the document is
loaded into a desktop frame. An empty URL is valid and describes a case
where the document shall be reloaded from its original location. An empty
URL together with an |
|
The number of seconds after which a specified URL is to be loaded after
the document is loaded into a desktop. A value of 0 is valid and
describes a redirection. A value of 0 together with an empty string as
|
|
The name of the default frame into which links should be loaded if no target is specified. |
|
Statistics about the document, as separate properties. They will be recalculated and overwritten at document open.
|
|
The number of times that the file has been saved. |
|
The amount of time that the file has been open for editing since the file was created. The editing time is updated when file saved. |
|
|
|
Time, place, or jurisdiction that the document is relevant to. For example, a range of dates, a place, or an institution that the document applies to. |
|
Some unique identifier like ISBN. |
|
|
|
|
|
Intellectual property rights associated with the document. For example, a copyright statement, or information about who has permission to access the document. |
|
Information about other resources from which the document is derived. For example, the name or identifier of a hard copy that the document was scanned from, or a URL that the document was downloaded from. |
|
Information about the category or format of the document. For example, whether the document is a text document, image, or multimedia presentation. |
|
List of user defined properties. Date/Time related types are not supported yet. Their Names, and types will be extracted but their values will not. |
Note
Extraction of UserDefinedProperties
may retrieve other types, based on what type of document properties it has. Unfortunatelly the different parts of the LibreOffice have a bit different limitations for these types:
With the recent LibreOffice these 6 types are found within the UI dialog:
string
,boolean
,double
,com.sun.star.util.Date
,com.sun.star.util.DateTime
, andcom.sun.star.util.Duration
There are other ways to make document properties, that can add different types. And probably older versions of LibreOffice allow to add different (deprecated) types as well, that can still be extracted from old documents.
Unfortunatelly the exact limitation for the possible document property types aren’t well documented. Checking from the source code (when a property is added) hints at what types can be expected in some special cases. Seven more types have been identified:
typelib_TypeClass_FLOAT
,typelib_TypeClass_HYPER
,typelib_TypeClass_LONG
,typelib_TypeClass_SHORT
,Time
,DateTimeWithTimezone
, andDateWithTimezone
.
Transform
Example Transform:
{
"Transforms": {
"DocumentProperties": {
"Author":"Author TxT",
"Generator":"Generator TxT",
"CreationDate":"2024-01-21T14:45:00",
"Title":"Title TxT",
"Subject":"Subject TxT",
"Description":"Description TxT",
"Keywords": [ ],
"Language":"en-GB",
"ModifiedBy":"ModifiedBy TxT",
"ModificationDate":"2024-05-23T10:05:50.159530766",
"PrintedBy":"PrintedBy TxT",
"PrintDate":"0000-00-00T00:00:00",
"TemplateName":"TemplateName TxT",
"TemplateURL":"TemplateURL TxT",
"TemplateDate":"0000-00-00T00:00:00",
"AutoloadURL":"",
"AutoloadSecs": 0,
"DefaultTarget":"DefaultTarget TxT",
"DocumentStatistics": {
"PageCount": 300,
"TableCount": 60,
"ImageCount": 10,
"ObjectCount": 0,
"ParagraphCount": 2880,
"WordCount": 78680,
"CharacterCount": 485920,
"NonWhitespaceCharacterCount": 411520
},
"EditingCycles":12,
"EditingDuration":12345,
"Contributor":["Contributor1 TxT","Contributor2 TXT"],
"Coverage":"Coverage TxT",
"Identifier":"Identifier TxT",
"Publisher":["Publisher TxT","Publisher2 TXT"],
"Relation":["Relation TxT","Relation2 TXT"],
"Rights":"Rights TxT",
"Source":"Source TxT",
"Type":"Type TxT",
"UserDefinedProperties":{
"Add.NewPropName Str": {
"type": "string",
"value": "this is a string"
},
"Add.NewPropName Str": {
"type": "boolean",
"value": false
},
"Add.NewPropName Bool": {
"type": "boolean",
"value": true
},
"Add.NewPropName Numb": {
"type": "long",
"value": 1245
},
"Add.NewPropName float": {
"type": "float",
"value": 12.45
},
"Add.NewPropName Double": {
"type": "double",
"value": 124.578
},
"Delete": "NewPropName Double"
}
}
}
}
To transform a document property you can use the same named commands as the extracted data was named.
There are some additional commands for UserDefinedProperties
to add a remove properties:
Command |
Description |
---|---|
|
<string> It will delete the user defined property. |
|
|
Note
Some property values are overwritten when the document is opened:
ModifiedBy
andModificationDate
are overwritten by any save. (That is why in the screenshot they have wrong values)DocumentStatistics
are recalculated and overwritten when the document is opened, but it does not recalculated on extract.
Screenshot
Example Files
Command for transform:
curl -v -k -F "data=@docStructureChartExampleOriginal.odt" -F "transform=$(cat DocPropTransform.JSON)" https://localhost:9980/cool/transform-document-structure > DocPropResult.odt
Command for extract:
curl -k -F "data=@temp2.odt" -F "filter=docprops" https://localhost:9980/cool/extract-document-structure > DocPropExtract.JSON
Slides
Can be used only on Impress documents (presentations).
Slides are individual pages in presentations that can contain various elements, including text, images, videos, audio, shapes, and more. Master slides, are template pages used for creating slides. Layouts are templates for elements on the slide: type, position, size.
This API can extract the slides and master slides structure, and transform slides in the document. It can create, delete and reorder slides, change their layout, and change text of text based elements.
Extract
Use filter=slides
to extract the slides.
Example output (pretty printed):
{
"DocStructure": {
"SlideCount": 7,
"MasterSlideCount": 8,
"MasterSlides": [
"MasterSlide 0": {
"Name": "Topic_Separator_Purple"
},
"MasterSlide 1": {
"Name": "Content_sidebar_White"
},
"MasterSlide 2": {
"Name": "Topic Separator white"
},
"MasterSlide 3": {
"Name": "Content_sidebar_White_"
},
"MasterSlide 4": {
"Name": "Topic_Separator_Purple_"
},
"MasterSlide 5": {
"Name": "Content_White_Purple_Sidebar"
}
],
"Slides": [
"Slide 0": {
"SlideName": "Slide3-Renamed",
"MasterSlideName": "Content_White_Purple_Sidebar",
"LayoutId": 3,
"LayoutName": "AUTOLAYOUT_TITLE_2CONTENT",
"ObjectCount": 4,
"Objects": [
"Objects 0": {
"TextCount": 1,
"Texts": [
"Text 0": {
"ParaCount": 1,
"Paragraphs": [
"Friendly Open Source Project"
]
}
]
},
"Objects 1": {},
"Objects 2": {
"TextCount": 1,
"Texts": [
"Text 0": {
"ParaCount": 9,
"Paragraphs": [
"Real Open Source",
"100% open-source code",
"Built with LibreOffice technology",
"Built with Free Software technology stacks: primarily C++",
"Runs best on Linux",
"Open Development",
"Anyone can contribute & participate",
"Follow commits and tickets",
"Public community calls - forum has details"
]
}
]
},
"Objects 3": {
"TextCount": 1,
"Texts": [
"Text 0": {
"ParaCount": 5,
"Paragraphs": [
"Focus:",
"a non-renewable resource.",
"Office Productivity & Documents",
"Excited about migrating your\u0001documents",
"Grateful to our partners for solving\u0001other problems."
]
}
]
}
]
},
"Slide 1": {
"SlideName": "Slide 2",
"MasterSlideName": "Topic_Separator_Purple",
"LayoutId": 3,
"LayoutName": "AUTOLAYOUT_TITLE_2CONTENT",
"ObjectCount": 1,
"Objects": [
"Objects 0": {
"TextCount": 1,
"Texts": [
"Text 0": {
"ParaCount": 3,
"Paragraphs": [
"Collabora Online",
"",
"Powerful Online Collaboration"
]
}
]
}
]
}
]
}
}
Extracted properties from the Impress presentation:
Property |
Description |
---|---|
|
Number of slides in the presentation. |
|
Number of master slides in the presentation. These are real pages in the presentation, only used as template for slides. |
|
List of all the master slides, and some of their data. Currently only extract their name and ID. |
|
List of all the slides, and some of their data. See table below. |
Extracted properties from a slide:
Property |
Description |
---|---|
|
Name of the slide. If a slide doesn’t have a unique name they are named dynamically like “Slide 1”, “Slide 2”, etc. |
|
Name of the master slide, this slide is made from. |
|
The ID number of the actual layout used. |
|
Name of the Layout. |
|
Number of elements in the slide. An elemet can be text, image, video, audio, shape and more… |
|
List of all the elements and some of their data. See table below. |
Extracted properties from an object. Currently only text based information can be extracted:
Property |
Description |
---|---|
|
Number of texts in this object. For example table objects can have more texts. |
|
List of all the texts. See table below. |
Extracted properties from a text object. Currenbtly only text based information can be extracted:
Property |
Description |
---|---|
|
Number of paragraphs in this text object. |
|
Array of all its paragraphs, as simple strings. |
Transform
Example Transform:
{
"Transforms": {
"SlideCommands": [
{"JumpToSlideByName": "Slide 3"},
{"MoveSlide": 0},
{"RenameSlide": "Slide3-Renamed"},
{"DeleteSlide": 2},
{"JumpToSlide": 2},
{"DeleteSlide": ""},
{"JumpToSlide": 1},
{"DuplicateSlide": ""},
{"RenameSlide": "Slide1-Duplicated"},
{"InsertMasterSlide": 1},
{"RenameSlide": "SlideInserted-1"},
{"ChangeLayout": 18},
{"JumpToSlide": "last"},
{"InsertMasterSlideByName": "Topic Separator white"},
{"RenameSlide": "SlideInserted-Name"},
{"ChangeLayoutByName": "AUTOLAYOUT_TITLE_2CONTENT"},
{"SetText.0": "first"},
{"SetText.1": "second"},
{"SetText.2": "third"},
{"DuplicateSlide": 1},
{"MoveSlide.2": 6}
]
}
}
There is always a current slide that most commands do act on, and some commands that change the current slide. By default the current slide is the slide at index 0.
To transform a slide you can use these commands:
Command |
Value |
Description |
---|---|---|
|
<num> |
|
Jump to the slide a index, or to the last slide. The index
is 0 based.
Using |
|
<string> |
Jump to the named slide. Be careful with default slide names like “Slide 1”. Those names can change during slide deletion or insertion. |
|
<num> |
Insert a new slide after the current slide, based on the master slide at index. Jump to the newly created slide, setting the current slide. |
|
<string> |
Insert a new slide after the current slide, based on the named master slide. Jump to the newly created slide, setting the current slide. |
|
NONE | <num> |
Delete the slide at index or if none, the current slide. Will jump to the previous slide, or to the new first slide if this was the first slide. If this is the current slide then it will jump to the previous slide. If needed the index of the current slide will be readjusted so the current slide is unchanged. As there must always be one slide left in the presentation, the last remainind slide can not be deleted. |
|
<num> |
Move the the current slide to the new positon. The index the current slide is readjusterd to follow. |
|
<num> |
Move the slide at index to a new position. If the index is the one of the current slide then it is like the previous command. Otherwise the current slide will be unchanged, but its index may be adjusted as needed. |
|
NONE | <num> |
Duplicate the slide at index, or if none, the current slide, and jump to this new slide. |
|
<num> | <string> |
Change the layout of the current slide to the layout with the index or the named layout. For Layout names, you can use these:
|
|
<string> |
Rename the current slide. Use unique names. Two slides cannot have the same name. Default names like “Slide 1”, “Slide 43”, cannot be set. |
|
<text> |
Set the object text as index to text. Supported only on text based objects. |
|
<num> |
Mark (select) the object at index on the current slide. This allows to use UNO commands that work on selected objects. |
|
<num> |
Unmark (deselect) the object at index on the current slide. |
|
<string> |
Call the UNO command.
For example |
Note
Have to check which UNO commands works here, and how to use tham. Some may need parameters.
For now, only .uno:DefaultBullet
checked and tested to work.
To obtain the full list of enabled uno commands, you can check: sfx2/source/control/unoctitm.cxx
under:
const std::map<std::u16string_view, KitUnoCommand>& GetKitUnoCommandList()
Screenshot
Example Files
Command for transform:
curl -v -k -F "data=@SlidesExampleOriginal.odp" -F "transform=$(cat SlidesTransform.JSON)" https://localhost:9980/cool/transform-document-structure > SlidesResult.odp
Command for extract:
curl -k -F "data=@SlidesExampleOriginal.odp" -F "filter=slides" https://localhost:9980/cool/extract-document-structure > SlidesExtractOriginal.JSON
Version updates
Important
The extract-document-structure
and transform-document-structure
endpoints are restricted to the allowed host addresses that can be set in the coolwsd.xml
configuration file. The IP addresses have to be added as dot-escaped net.post_allow.host
entries.