Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

<#assign s0 ="this is a string that has no spelling errors of any kind that I know of">
<#assign s1 = "this is a strig that has summ spelling errors of teh sortt yu might ecpect">
<#assign n0 = LIB.getSpellingErrorRate(s0)>
<#assign n1 = LIB.getSpellingErrorRate(s1)>
n0 = ${n0}<br/>
n1 = ${n1}<br/>

URL Util

This object provides standards-compliant methods for editing URL query parameters.  The most common usage of this class will be (1) removing existing query parameters, especially Google analytics; (2) adding Google analytics parameters; and (3) extracting the source URL from Google news links.

In all three use cases we start with getting the UrlUtil object like so:

<#assign u=doc.url>
<#assign uu=LIB.getUrlUtil(u)>

Following that, you can remove GA parameters with a single call:

<#assign dumy = uu.removeGA()>

or set one or more query parameters like so:

<#assign dumy=uu.set('utm_content','value1').set('rel','0')>

See the promotion scripts in the Reference Persona for real world usage.

If you want to use Google news sources in feeds, you will need to strip the real source URL out of the Google news link.  This 6 line script fixes Google URLs and leaves the rest unchanged:

<#assign u=doc.url>
<#if u.starts_with("https://www.google.com/url")>
  <#assign uu=LIB.getUrlUtil(u)>
  <#assign url=uu.get("url")>
  <#assign dumy = doc.setFeedItemUrl(url)>
</#if>

Word Count

The same word count function that is available as a pre-defined filter is available to scripts as well.  Usage is pretty simple.  The getWords function returns an array of all the individual words in the input string.  Call ?size to get the count.  This is now used in the rejection filters for the MTurk workflows in the Reference Persona.

<#assign s = "this is a string. it's semi-interesting, if you [sic] like this sort of thing!much">
<#assign n = LIB.getWords(s)?size>
${n}

Sentences

Text can be broken up into an array of sentences using standard English delimiters as shown in the sample code below.  This code snippet pulls the text from a paragraph in the source document, extracts the array of sentences from that paragraph, and returns the count.  See the FTL documentation to see how to use the the array of sentences to do other things as well.

<#assign dom = doc.soup>
<#assign text = dom.select("p#abstract").text()>
<#assign sentences = LIB.getSentences(text)>
${sentences?size}

Summarizer

There is a “lightweight” language processing utility available to compute a summary of a block of text.  The example below uses it to pull one sentence summaries from all paragraphs of a document that are 3 sentences or longer.  This is just an example of the use of the function, it’s likely that using this code as-is would not return a usable result in most cases.  The parameters to textSummary() are the text to summarize and the length of the summary in number of sentences.

<#assign dom=doc.soup>
<#assign ps = dom.select("p")>
<#list ps as p>
  <#if p.hasText() >
    <#assign sentences = LIB.getSentences(p.text())>
    <#if (sentences?size > 3) >
       <#assign summary = LIB.textSummary(p.text(),1)>
       <p>${summary}</p>
    </#if>
  </#if>
</#list>

MTurk Assignment(s)

TBD

OpenGraph

This is (as of release 4.5.11) a first version of this feature.  It allows scripts to fetch and parse into a name/value array the OpenGraph data for a provided URL.  All scripting is “an expert feature” and this one is no different, but if you feel the desire, have a go!  Provide here are two example scripts to get you started.

A good place to start is to simply dump the values of OG data for a provided URL:

<#assign og = LIB.getOpenGraph(doc.url)>
<#if og??>
   <#assign props = og.getProperties()>
    <ul>
    <#list props as prop>
        <li>${prop.getProperty()} = ${prop.getContent()}</li>
    </#list>
    </ul>
<#else>
    og is null
</#if>

A more common usage will be fetching (and possibly checking) the og:image value. Here’s some code to get the image URL:

<#assign og = LIB.getOpenGraph(doc.url)>
<#if og??>
    <#assign imageUrl = og.getContent("image")>
    <img src="${imageUrl}" />
</#if>