this is some text saved to the page

Document properties

<#assign dumy = doc.setTitle("this is the new title")>
<#assign dumy = doc.setDescription("this is the new description")>
<#assign dumy = doc.setFeedItemUrl("https://not-a-real-domain.com")>

Document custom properties

Custom document properties travel with a document and are preserved through all workflows just like title, content, and several other standard properties. Custom properties are limited to 20,000 characters or less. There are two groups of custom document properties. The first group, legacy custom properties, is widely used in our standard automations, especially the Rich Summary process, so you use these only if you understand where else they are used. They are named – very imaginatively – custom1, custom2, …, custom10. To access one of these values in a script do:

...

You can also create, edit, and delete aspects and properties using the meta-data functions in the upper right corner of the document edit page.

jSoup

This oddly named package provides easy yet powerful ways to both query and edit HTML documents. The entire package is exposed so go here for the full documentation. Here is a simple example of it’s use.

...

The code above gets the jSoup structure into ‘dom’. We then find the first paragraph, add some styling to it, and then add a new paragraph immediately following. Finally, we find we replace the contents of the paragraph with id=specialp with the text shown. The dom variable now contains the original document with content changed and added. We probably want to keep those changes for use in later workflows, so the final line takes the dom and converts it back to html as the output of this script.

Persona properties

All workflows operate within some Persona. The properties of that Persona are now available to scripts. These are currently of somewhat limited use, but in future releases additional properties will be included and supported in the same way as shown in the sample script below. Note that no updating of these properties is supported – they are read-only.

tone=${persona.tone}<br/>
style=${persona.style}<br/>
viewpoint=${persona.viewpoint}<br/>
mission=${persona.mission}<br/>

Random

Scripts have not previously had a sound random value function. We now have one. The randomInt call will provide a value between 0 and the specified limit. The randomBoolean call returns true/false values.

<#assign n1 = LIB.randomInt(10)>
<#assign b1 = LIB.randomBoolean()>
random num = ${n1}
<#if b1 >
random bool = true
<#else>
random bool = false
</#if>

Similarity

Turkers that copy/paste to create an abstract are a problem! Likewise those that produce a ‘quote’ that is not really a quote. This function allows us to detect these bad actors. The textSimilarity function uses Levenshtein distance to compare two strings. Their degree of similarity is returned as an integer percentage between 0 and 100 where 100 is an exact match.

...

<#assign dom = doc.soup>
<#assign source = dom.text()>
<#assign abstract = "blah blah blah">
<#assign sim = LIB.textSimilarity(source, abstract)>
similarity=${sim}

Spelling

The same spell checker that is available as a ‘predefined’ filter is also available to scripts. The value returned is the integer error rate from 0 to 100 where 100 would mean every single word is misspelled! Note that the spelling dictionary is (currently) hard-coded to American English … hmm, how is that not an oxymoron? … so be careful how tightly to filter for spelling errors if you are expecting non-American spellings.

<#assign s0 ="this is a string that has no spelling errors of any kind that I know of">
<#assign s1 = "this is a strig that has summ spelling errors of teh sortt yu might ecpect">
<#assign n0 = LIB.getSpellingErrorRate(s0)>
<#assign n1 = LIB.getSpellingErrorRate(s1)>
n0 = ${n0}<br/>
n1 = ${n1}<br/>

URL Util

This object provides standards-compliant methods for editing URL query parameters. The most common usage of this class will be (1) removing existing query parameters, especially Google analytics; (2) adding Google analytics parameters; and (3) extracting the source URL from Google news links.

...

<#assign u=doc.url>
<#if u.starts_with("https://www.google.com/url")>
  <#assign uu=LIB.getUrlUtil(u)>
  <#assign url=uu.get("url")>
  <#assign dumy = doc.setFeedItemUrl(url)>
</#if>

Word Count

The same word count function that is available as a pre-defined filter is available to scripts as well. Usage is pretty simple. The getWords function returns an array of all the individual words in the input string. Call ?size to get the count. This is now used in the rejection filters for the MTurk workflows in the Reference Persona.

<#assign s = "this is a string. it's semi-interesting, if you [sic] like this sort of thing!much">
<#assign n = LIB.getWords(s)?size>
${n}

Sentences

Text can be broken up into an array of sentences using standard English delimiters as shown in the sample code below. This code snippet pulls the text from a paragraph in the source document, extracts the array of sentences from that paragraph, and returns the count. See the FTL documentation to see how to use the the array of sentences to do other things as well.

<#assign dom = doc.soup>
<#assign text = dom.select("p#abstract").text()>
<#assign sentences = LIB.getSentences(text)>
${sentences?size}

Summarizer

There is a “lightweight” language processing utility available to compute a summary of a block of text. The example below uses it to pull one sentence summaries from all paragraphs of a document that are 3 sentences or longer. This is just an example of the use of the function, it’s likely that using this code as-is would not return a usable result in most cases. The parameters to textSummary() are the text to summarize and the length of the summary in number of sentences.

<#assign dom=doc.soup>
<#assign ps = dom.select("p")>
<#list ps as p>
  <#if p.hasText() >
    <#assign sentences = LIB.getSentences(p.text())>
    <#if (sentences?size > 3) >
       <#assign summary = LIB.textSummary(p.text(),1)>
       <p>${summary}</p>
    </#if>
  </#if>
</#list>

OpenGraph

This is (as of release 4.5.11) a first version of this feature. It allows scripts to fetch and parse into a name/value array the OpenGraph data for a provided URL. All scripting is “an expert feature” and this one is no different, but if you feel the desire, have a go! Provide here are two example scripts to get you started.

...

Space shortcuts

Page tree

Versions Compared

Old Version 18

New Version Current

Key

Document properties

Document custom properties

jSoup

Persona properties

Random

Similarity

Spelling

URL Util

Word Count

Sentences

Summarizer

OpenGraph

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 18

New Version Current

Key

Document properties

Document custom properties

jSoup

Persona properties

Random

Similarity

Spelling

URL Util

Word Count

Sentences

Summarizer

OpenGraph