Root element

<process>

This is the root element of the process. It contains the source, target and the transformation steps of the process.

Example


        <?xml version="1.0" encoding="UTF-8"?>
        <process xmlns="urn:hu.sztaki.ilab.longneck:1.0">
          <source>
            <!-- source definition goes here -->
          </source>
          <target>
            <!-- target definition goes here -->
          </target>
          <blocks>
            <!-- transformations... -->
          </blocks>
          <test-cases>
            <!-- test cases... -->
          </test-cases>
        </process>
      

Attributes

This element does not have attributes.

Content


Sources and targets

<console-target>

This target definition writes records to the standard output. Each record is printed on a new line, and fields are printed within curly braces in quoted form.
Note: quotes in a field value are not escaped.

Example


        <target>
          <console-target/>
        </target>
      

Attributes

This element does not have attributes.

Content

This element does not have children.

<csv-source>

The CSV source is defined with the csv-source element, and it reads one or more CSV files from a path specified in the configuration. The path in the following example shows, how the source file is configured either in a configuration file, or on the command line with the "-D" switch. NAME will be the name of the data source as it is referenced in the process definition, eg. in the following example "test".

csvSource.NAME.path=data/myfile.csv

The following example shows a common usage of the CSV source:

Example


        <source>
          <csv-source name="test" delimiter=";"/>
        </source>
      

The following example shows, how the source file is given inline:

Example


        <source>
          <csv-source path="data/myfile.csv" delimiter=";"/>
        </source>
      

Attributes

name content description
name optional string Sets the name of the source that selects the csvSource.<NAME>.path property in the configuration, which in turn specifies the data files to read.
path optional string

Sets the name of the source which explicit specifies the data files to read.

delimiter optional string

Specifies the field delimiter character used in the files. Default: ';'.

has-headers optional boolean Informs the application whether the data files contain field names as the first record. Valid values are true or false. Default is true.
columns optional string A space separated list of field names in the order they occur in the input data file.
If the has-headers attribute is false, this attribute is required.

Content

This element does not have children.

<csv-target>

The CSV target is the counterpart of the CSV source, and let's you write records into a CSV file of your choice. This target can be used with the csv-target element in the process file.

Attributes

name content description
name optional string Sets the name of the target that selects the csvTarget.<NAME>.path property in the configuration, which in turn specifies the data files to read.
path optional string

Sets the name of the target which explicit specifies the data files to read.

delimiter optional string Specifies the field delimiter character used in the files. Default: ';'
columns optional string A whitespace separated list of field names in the order they should occur in the output data file. Fields not listed are skipped from the output. A mapping may be applied to the field names by using a recordFieldName=csvFieldName pattern for any of the fields.
If omitted, field names are copied from the first record written out.
empty-value optional string Sets the string used in the output file for empty values. Default is a zero-length string.

Content

This element does not have children.

<database-source>

Database source definition element. This source uses a JDBC connection to connect to a database, and uses SQL to query source data. The connection is selected by the connection-name attribute. To configure a connection specifiy the following properties in a configuration property file:

database.connection.NAME.type=jdbc
database.connection.NAME.url=jdbc:oracle:thin:@localhost:1521:tdb
database.connection.NAME.user=somedb
database.connection.NAME.password=XXXX

The following example shows how to use a database-source:

Example


        <database-source connection-name="test">
          <query>
            select * from dual
          </query>
        </database-source>
      

Attributes

name content description
connection-name required string

Specifies the connection name, which is configured in the configuration properties.

Content

  • 1: <query>: This element contains the query that produces the input records.

<database-target>

Database target definition element. This target uses a JDBC connection to connect to a database, and uses SQL to insert produced data into a table. The connection is selected by the connection-name attribute. To configure a connection specifiy the following properties in a configuration property file, where NAME must match the identifier specified in the attribute:

database.connection.NAME.type=jdbc
database.connection.NAME.url=jdbc:oracle:thin:@localhost:1521:tdb
database.connection.NAME.user=somedb
database.connection.NAME.password=XXXX

Attributes

name content description
connection-name required string

Specifies the connection name, which is configured in the configuration properties.

error-threshold optional integer

Sets a treshold for the erroneous lines which cannot be inserted into the target table (for any reasons, eg. because of type mismatch, violating constraints etc.) When exceeding the limit the longneck process stops and writes an error log message.

Am other way to set this threshold value is to provide to the cli parameters -DdatabaseTarget.errorThreshold=(SomeInteger)

CLI parameters override XML attributes.
numeric-fields-to-convert optional string

List field names separated with space characters in this attribute to cast their values to integer before passing them to the database.

Content

  • 0..1: <truncate-query>: This optional element specifies a query to truncate the database table before writing any input. For you can invoke this normal targets with the '-T' switch, for error targets with the '-E' switch on the command line.
  • 1: <insert-query>: Specifies the insert SQL query to insert records into the database table. The values of the fields can be referred to by using field names prefixed with a colon as placeholders in the query, for example ':someFieldName'.

<null-source>

This source definition does not read anything at all.

Example


        <source>
          <null-source/>
        </source>
      

Attributes

This element does not have attributes.

Content

This element does not have children.

<null-target>

This target definition does not write anything at all.

Example


        <target>
          <null-target/>
        </target>
      

Attributes

This element does not have attributes.

Content

This element does not have children.

<simple-file-target>

This target definition writes records into the file specified in the path attribute. Each field value is written on a new line, and there is no record delimiter.

Attributes

name content description
path required string

Sets the path to the target file the records are written to.

Content

This element does not have children.

Constraints

<alphabet>

Checks if characters in the input value come from the specified alphabet. The policy defines if the characters in the alphabet are allowed or denied.

Example


          <alphabet apply-to="a b c" classes="Letter Number" policy="Deny"/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

classes required List of One of Letter, Number, Space

Character classes checked in this constraint.

policy optional One of Allow, Deny

Allow or deny policy for character containment.

Content

This element does not have children.

<and>

Defines a logical AND relationship between the contained constraints.

Example


          <and>
            <not-null apply-to="zip-code"/>
            <exists apply-to="zip-code"/>
          </and>
        

Attributes

This element does not have attributes.

Content


<constraint>

Generic constraint in an external package.

Example


          <constraint id="address.part.public-domain-type.hu" version="1.0">
            <lookup apply-to="public-domain-type">
              <external-wordset column-name="word">
                <wordset-file-source source-file="longneck-jobs/lookup_resources/wordsets/cr_address/domain_type.txt"/>
              </external-wordset>
            </lookup>
          </constraint>
        

Attributes

name content description
id required [0-9a-zA-Z_.-]+
version optional [0-9a-zA-Z.\-]+

Content


<constraint-ref>

Reference to a constraint in an external package.

Example


        <constraint-ref id="package4:some.constraint" version="3.0">
          <pass fields="a b c"/>
          <prefix fields="d e" text="fff"/>
          <postfix fields="g h" text="iii"/>
          <map from="j" to="k"/>          
          <map from="j" to="k"/>          
          <map from="j" to="k"/>          
          <map from="j" to="k"/>          
        </constraint-ref>
        

Attributes

name content description
id required [0-9a-zA-Z_/.-]+:[0-9a-zA-Z_.-]+ Identifier of the package, followed by ':' and the name of the referred block. Identifier of the package is the name file, which contains the referred block, with the reltive path from the current- or the repository directory (begin the path with a '/').

Some example for packageid:

Relative packageid from Repository dirctory as a root directory of repository files:

/location/postal-address/hu/zip-code:zip-code.lookup.

Other way to refer a package is the relative reference from the given directory; for example the reference is in Repository/location/postal-address/hu/ :

zip-code:zip-code.lookup

or

../en/zip-code:zip-code.lookup

version optional [0-9a-zA-Z.\-]+

The version of the block, to distinguish variants of the same transformation.

Content


<entity-ref>

Reference to an entity in an external package.

Example


          <entity-ref id="address:address.part.zip-code.hu" version="1.0">
            <pass fields="a b c"/>
            <prefix fields="d e" text="fff"/>
            <postfix fields="g h" text="iii"/>
            <map from="j" to="k"/>          
            <map from="j" to="k"/>          
            <map from="j" to="k"/>          
            <map from="j" to="k"/> 
          </entity-ref>
        

Attributes

name content description
id required [0-9a-zA-Z_/.-]+:[0-9a-zA-Z_.-]+ Identifier of the package, followed by ':' and the name of the referred block. Identifier of the package is the name file, which contains the referred block, with the reltive path from the current- or the repository directory (begin the path with a '/').

Some example for packageid:

Relative packageid from Repository dirctory as a root directory of repository files:

/location/postal-address/hu/zip-code:zip-code.lookup.

Other way to refer a package is the relative reference from the given directory; for example the reference is in Repository/location/postal-address/hu/ :

zip-code:zip-code.lookup

or

../en/zip-code:zip-code.lookup

version optional [0-9a-zA-Z.\-]+

The version of the block, to distinguish variants of the same transformation.

Content


<equals>

Checks, if the specified field's value equals the value in the given field in the with attribute. The values are compared as strings.

Example


          <equals apply-to="house-number" with="hrsz"/>
        

Checks, if the apply-to fields value equals to the specified constant.

Example


          <equals apply-to="house-number" value="113"/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

value optional string

Value to be equal with.

with optional (\$)?[0-9a-zA-Z_.-]+

The identifier of the field the checked fields must be equal to.

Content

This element does not have children.

<equals-imploded>

Implodes the source fields using the text specified in the glue attribute, and checks if the apply-to field value is equal to it. If glue is empty, the source field values are simply concatenated.

Example


          <equals-implode apply-to="house-number" sources="F_HAZSZAM F_AJTO" glue="."/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

sources required List of (\$)?[0-9a-zA-Z_.-]+

The list of sources.

glue required string

The glue to merge between parts.

Content

This element does not have children.

<exists>

Checks, if the specified field exists.

Example


          <exists apply-to="zip-code"/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

Content

This element does not have children.

<false>

Checks if the field is false. A value is false, if it equals the string "false", case insensitive.

Example


          <false apply-to="field1"/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

Content

This element does not have children.

<has-flag>

Checks, if the field has the specified flag.

Example


          <has-flag apply-to="flag1" flag="INVALID"/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

flag required One of INVALID, NOT_APPLICABLE

Defines the flag type for the operation.

Content

This element does not have children.

<is-empty>

Checks, if the specified field is empty. A field is considered empty, if it's value is null, or a zero-length string.

Example


          <is-empty apply-to="house-number"/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

Content

This element does not have children.

<is-notlonger>

Checks if the input value is not longer than a given length.

Example


          <is-notlonger apply-to="a b c" value="4"/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

value required integer

The maximum acceptable length of the value.

Content

This element does not have children.

<is-null>

Checks, if the specified field is null.

Example


          <is-null apply-to="country"/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

Content

This element does not have children.

<match>

Checks, if the field matches the specified regular expression. Matched subgroups can also be checked using contained constraints. The match groups are assigned to variables $0, $1, $2... and so on, within the match constraint scope.

Example


          <match apply-to="unit" regexp="(?i:hrsz?(:|.)?)"/>
        
It is possible to give the regexp implicitly instead of explicit case in the above example. The regexp-field attribute contains a filed with the desired regular expression.

Example


          <match apply-to="unit" regexp-field="myregexp"/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

regexp optional string

A regular expression.

regexp-field optional string

A field containing a regular expression.

Content


<not>

Negates the result of the contained constraint check.

Example


          <not>
            <exists apply-to="unit"/>
          </not>
        

Attributes

This element does not have attributes.

Content


<not-empty>

Checks, if the specified field is not empty. A field is considered empty, if it's value is null, or a zero-length string.

Example


          <not-empty apply-to="bm-code"/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

Content

This element does not have children.

<not-null>

Checks, if the specified field is not null.

Example


          <not-null apply-to="bm-code"/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

Content

This element does not have children.

<or>

Or operator for constraints.

Example


          <or>
            <is-empty apply-to="house-number"/>
            <when>
              <not-null apply-to="house-number"/>
              <then>
                <not>
                  <entity-ref id="address:address.part.house-number.hu" version="1.0"/>
                </not>
              </then>
            </when>
            <equals apply-to="house-number" with="hrsz"/>
            <match apply-to="unit" regexp="(?i:hrsz?(:|.)?)"/>
          </or>
        

Attributes

This element does not have attributes.

Content


<orcase>

OrCase in an or-switch structure. See <orswitch-strict> for details.

Attributes

This element does not have attributes.

Content


<orswitch-strict>

The Or-Switch is similar to a switch block, but it is a constraint. The or-cases are checked in sequential order. If any or-case succeeds, the or-switch succeeds. If no or-cases succeed, the or-switch fails.

Each or-case may have a "then" branch, which is evaluated only if the basic or-case check succeeds. If the outcome is true, the result of the or-switch is the result of the checks in the "then" branch of the or-case.

Note, that the result of the then block may be false, and then the result of the entire or-switch is false.

Example


        <orswitch-strict>
          <orcase>
            <entity-ref id="phonenr:phone-number.canonized.hu.selector" version="1.0"/>
            <then>
              <entity-ref id="phonenr:phone-number.canonized.hu" version="1.0"/>
            </then>
          </orcase>
          <orcase>
            <entity-ref id="phonenr:phone-number.canonized.sk.selector" version="1.0"/>
            <then>
              <entity-ref id="phonenr:phone-number.canonized.sk" version="1.0"/>
            </then>
          </orcase>
        </orswitch-strict>
      

Attributes

This element does not have attributes.

Content


<true>

Checks, if the specified field is true. The value is true, if it equals the string "true" case insensitive.

Example


          <true apply-to="field1"/>
        

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element is applied on.

Content

This element does not have children.

<when>

This constraint imposes further constraints on the checked record based on a preliminary check. If the "when" checks succeed, the result of the full "when" constraint equals the result of the "then" branch. If the "when" check fails, the result of the full "when" constraint is the outcome of the "else" check.

Note, that the this constraint may succeed even if the "when" checks fail.

Example


          <when>
              <not-null apply-to="house-number"/>
              <then>
                <not>
                  <entity-ref id="address:address.part.house-number.hu" version="1.0"/>
                </not>
              </then>
            </when>
        

Attributes

This element does not have attributes.

Content


Blocks

<add-flag>

Adds a pre-defined flag to the fields listed in the apply-to attribute. The flag can be chosen from the values specified below.

Example


        <add-flag apply-to="field1 field2" flag="INVALID"/>
      

Abstract flag-related type.

Abstract field-only block type.

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element applies to.

flag required One of INVALID, NOT_APPLICABLE

Defines the flag type for the operation.

Content

This element does not have children.

<add-postfix>

Adds a mapping from the original field names to the postfixed names.

Example


        <add-postfix apply-to="house-number" text="_home"/>
      

Abstract atomic block type.

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element applies to.

text required string

The text to add to the names.

Content

This element does not have children.

<add-prefix>

Adds the specified prefix to the value of a field.

Example


        <add-prefix apply-to="house-number" text="home_"/>
      

Abstract atomic block type.

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element applies to.

text required string

The text to add to the names.

Content

This element does not have children.

<block>

The generic block is a callable subsequence of transformation steps, which can be referred by it's id and optional version. These blocks are stored in block packages, which are available from the repository.

Example


        <block id="address.canonize.init-place-name-hu.skipnull" version="1.0">
          <if>
            <is-null apply-to="place-name"/>
            <then>
              <break/>
            </then>
          </if>
          <trim apply-to="place-name"/>
          <collapse-whitespace apply-to="place-name"/>
          <set-character-case apply-to="place-name" case="Lowercase" characters="ALL"/>
          
          <set-character-case apply-to="place-name" case="Uppercase" characters="TOKEN_INITIALS_NON_ALNUM"/>
          <replace-all apply-to="place-name" regexp=".*((?i)N/A).*" replacement=""/>
          <replace-all apply-to="place-name" regexp="^Bpest" replacement="Budapest"/>
          <replace-all apply-to="place-name" regexp="^(Bp\.?)" replacement="Budapest"/>
          <replace-all apply-to="place-name" regexp="^(Pest)" replacement="Budapest"/>
          <replace-all apply-to="place-name" regexp="^(Buda)[\P{L}]" replacement="Budapest"/>
          <replace-all apply-to="place-name" regexp="^Ismeretlen Irányítószám !$" replacement=""/>
          <trim apply-to="place-name"/>
          <collapse-whitespace apply-to="place-name"/>
        </block>
      

Attributes

name content description
id required [0-9a-zA-Z_.-]+
version optional [0-9a-zA-Z.\-]+

Content


<block-ref>

Calls a block from an external package. You may use a mapping from the field names of the record to the names used inside the block, but when you do, you must list all used names, or a warning is issued each time an unmapped field name is accessed. If an error occurs inside the called block, the execution of the block is halted and processing continues from the block reference, unless you set the propagate-failure attribute.

Example


        <block-ref id="email:e-mail.multiple.process.skipnull"/>
      

This is a simple block reference without mapping.

Example


         <block-ref id="name:name.canonize" version="1.0">
           <unprefix fields="rep_tempname rep_temp_remain " text="rep_"/>
           <prefix fields="tempname temp_remain " text="rep_"/>
           <postfix fields="tempname temp_remain " text="_rep"/>
           <pass fields="tempname temp_remain"/>
           <map from="work-raw-extension" to="raw-extension"/>
         </block-ref>
      

A block reference with field name mapping and version attribute.

Example


        <block-ref id="email:e-mail.multiple.process.skipnull" propagate-failure="true"/>
      

Block reference with error propagation enabled.

Attributes

name content description
id required [0-9a-zA-Z_/.-]+:[0-9a-zA-Z_.-]+ Identifier of the package, followed by ':' and the name of the referred block. Identifier of the package is the name file, which contains the referred block, with the reltive path from the current- or the repository directory (begin the path with a '/').

Some example for packageid:

Relative packageid from Repository dirctory as a root directory of repository files:

/location/postal-address/hu/zip-code:zip-code.lookup.

Other way to refer a package is the relative reference from the given directory; for example the reference is in Repository/location/postal-address/hu/ :

zip-code:zip-code.lookup

or

../en/zip-code:zip-code.lookup

version optional [0-9a-zA-Z.\-]+

The version of the block, to distinguish variants of the same transformation.

propagate-failure optional boolean

Enables error propagation from the referred block to the outer environment. Errors originating from the referred block must be handled at the block reference level.

Content


<break>

Halts the execution of the current block, and returns to the step after the calling block reference. This does not trigger an error event.

Example


        <break/>
      

Attributes

This element does not have attributes.

Content

This element does not have children.

<caching>

Per-worker fields cache, that uses the Least-Recently-Used strategy to regulate it's size. The fields specified in the "apply-to" attribute are used as the cache key. Upon hit, the fields listed in the "output-fields" attribute are assigned from the cache.

Example


          <caching apply-to="userAgent" size="100" output-fields="userAgentAgentName userAgentAgentType userAgentAgentVersion">
            <copy apply-to="userAgentAgentName" from="userAgent"/>
          </caching>
        

Abstract atomic block type.

Attributes

name content description
apply-to required List of (\$)?[0-9a-zA-Z_.-]+

List of fields this element applies to.

output-fields required List of (\$)?[0-9a-zA-Z_.-]+

The fields, that are read from the cache on a hit.

size optional integer

Maximum number of cached records.

Content


<case>

A single case in a switch-like structure. See <switch> for details.

Attributes

This element does not have attributes.

Content


<check>

This block performs the checks defined in it, and triggers an error event when the check fails.

Example


        <check summary="Checks, if bm-code is exactly four digits plus one letter." checkedfield="bm-code">
          <match apply-to="bm-code" regexp="^\d{4}[A-Z0]?$"/>
        </check>
      

Attributes

name content description
summary required string

A short summary of the implemented checks that goes into the error report, very useful for debugging a failed constraint check.

checkedfield optional string

The field that is checked. The field's value is included in the error report for debugging.

Content


    <clear-flags>

    Clears all flags from the fields specified in the apply-to attribute.

    Example

    
            <clear-flags apply-to="field1 field2"/>
          

    Abstract field-only block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    Content

    This element does not have children.

    <clone-record>

    Creates a new record identical to the current one and optionally adds another field to the clone. The clone's transformation starts from the position of this block.

    Example

    
            <clone-record set-field="ctrl_doEventSplit" set-value="true"/>
          

    Attributes

    name content description
    set-field required string

    The field added when record is cloned.

    set-value required string

    The value of the newly added field.

    Content

    This element does not have children.

    <collapse-whitespace>

    Collapses sequences of whitespace characters to single space characters. This block does not remove additional whitespace from the beginning and the end of a string. Also see trim.

    Example

    
            <collapse-whitespace apply-to="county"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    Content

    This element does not have children.

    <copy>

    Copies the value of the specified field in the from attribute to other fields.

    Example

    
            <copy apply-to="public-domain-type" from="$1"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    from optional (\$)?[0-9a-zA-Z_.-]+

    The name of the field to copy from.

    with-flags optional boolean

    If tha value is true also copy source flags if any.

    Content

    This element does not have children.

    <cut>

    Cuts the length of the specified field values to a maximum size.

    Example

    
            <cut apply-to="a b c" value="4"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    value required integer

    The maximum length of the field value.

    Content

    This element does not have children.

    <datetime-to-format>

    Converts a date and time string to the given format. This operation does not support time zones in the output.

    Example

    
            <datetime-to-format apply-to="timeInMillis" from="time" fromPattern="dd/MMM/yyyy:HH:mm:ss Z" toPattern=""/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    from optional (\$)?[0-9a-zA-Z_.-]+

    The field, which contains the input date that is transformed.

    from-pattern optional string

    The pattern of the input date.

    to-pattern optional string

    The output pattern of the date being converted.

    Content

    This element does not have children.

    <datetime-to-milliseconds>

    Converts a date and time to milliseconds from the Epoch (1970-01-01 00:00:00.000).

    Example

    
            <datetime-to-milliseconds apply-to="timeInMillis" from="time" pattern="dd/MMM/yyyy:HH:mm:ss Z"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    from optional (\$)?[0-9a-zA-Z_.-]+

    The input date.

    pattern optional string

    The pattern of the input date.

    Content

    This element does not have children.

    <extract-timestamp>

    Converts the specified timestamp into a date, and assigns date fields to the variables $year, $month, $day, $hour, $min, and $sec.

    Example

    
            <extract-timestamp apply-to="timestamp">
    			<copy apply-to="year" from="$year"/>
    			<copy apply-to="month" from="$month"/>
    			<copy apply-to="day" from="$day"/>
    			<copy apply-to="hour" from="$hour"/>
    			<copy apply-to="min" from="$min"/>
    			<copy apply-to="sec" from="$sec"/>
    		</extract-timestamp>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    Content


    <fail>

    Halts the processing of the currently transformed record with failure. The record is not written to the output and an error event is recorded.

    Example

    
            <fail/>
          

    Attributes

    name content description
    summary optional string

    A short summary of the cause of the fail that goes into the error report, very useful for debugging.

    faildfield optional string

    The field that case the fail if it is relevant information not neccesary. The field's value is included in the error report for debugging.

    Content

    This element does not have children.

    <filter>

    Halts the processing of the currently transformed record. This does not trigger an error event.

    Example

    
            <filter/>
          

    Attributes

    This element does not have attributes.

    Content

    This element does not have children.

    <if>

    If-then-else control structure.

    Example

    
            <if>
              <equals apply-to="$has-type" value="false"/>
              <not-null apply-to="public-domain-type"/>
              <then>
                <if>
                  <entity-ref id="address:address.part.public-domain-type.hu" version="1.0"/>
                  <then>
                    <replace-all apply-to="public-domain-name" regexp="\s(\S*)$" replacement=""/>
                    <trim apply-to="public-domain-name"/>
                  </then>
                  <else>
                    <set-null apply-to="public-domain-type"/>
                  </else>
                </if>
              </then>
            </if>
          

    Attributes

    This element does not have attributes.

    Content


    <implode>

    Joins the values of the fields listed in source and adds the text defined in the glue parameter in between them.

    Example

    
            <implode apply-to="house-number" sources="F_HAZSZAM F_AJTO" glue="." skip-empty-strings="true"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    glue required string

    The separator character put between the joined values, may be empty string.

    sources required List of (\$)?[0-9a-zA-Z_.-]+

    The name of fields to joins.

    skip-empty-strings optional boolean

    Enables skipping of empty string values or nulls from result.

    Content

    This element does not have children.

    <match-extract>

    Matches the input against regular expression and extracts regexp groups. The groups defined in the regular expression are exported as variables starting from $0, which contains the entire match. Note, that the MatchExtract block behaves like a check block. If the specified regular expression does not match the input, an error event is triggered.

    Example

    
            <match-extract apply-to="phone-number" regexp="^(\d)(\d+)$">
              <if>
                <equals apply-to="$1" with="country-code"/>
                <then>
                  <copy apply-to="phone-number" from="$2"/>
                </then>
              </if>
            </match-extract>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    regexp optional string

    A regular expression.

    Content


    <milliseconds-to-datetime>

    Converts milliseconds to formatted date and time string.

    Example

    
            <milliseconds-to-datetime apply-to="date" from="timeinMillisecounds" pattern="dd/MMM/yyyy:HH:mm:ss Z"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    from required (\$)?[0-9a-zA-Z_.-]+

    The name of the field which contains the number of milliseconds.

    pattern required string

    The pattern of the output date.

    Content

    This element does not have children.

    <remove>

    Removes a field from the record. Note: Variables cannot be removed. A variable is deleted when execution leaves the scope where it was defined.

    Example

    
            <remove apply-to="date"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    Content

    This element does not have children.

    <remove-flag>

    Remove a given flag from all of the fields listed in the apply-to attribute.

    Example

    
            <remove-flag apply-to="field1 field2" flag="INVALID"/>
          

    Abstract flag-related type.

    Abstract field-only block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    flag required One of INVALID, NOT_APPLICABLE

    Defines the flag type for the operation.

    Content

    This element does not have children.

    <remove-prefix>

    If the field value starts with the specified text, this block removes that prefix.

    Example

    
            <remove-prefix apply-to="house_number" text="home_"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    from required string

    The field to read from.

    prefix required string

    The prefix to remove.

    Content

    This element does not have children.

    <replace-all>

    Replaces all occurences of the match attribute with the value of the replacement attribute.

    Example

    
            <replace-all apply-to="unit" regexp="(=\/|^)0(?=\/|$)" replacement=""/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    regexp optional string

    A regular expression.

    replacement optional string

    Replacement string.

    Content

    This element does not have children.

    <replace-first>

    Replaces the first occurence of the match attribute with the value of the replacement attribute.

    Example

    
            <replace-first apply-to="unit" regexp="(=\/|^)0(?=\/|$)" replacement=""/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    regexp optional string

    A regular expression.

    replacement optional string

    Replacement string.

    Content

    This element does not have children.

    <set>

    Sets the field or variable to the specified constant value.

    Example

    
            <set apply-to="$unit" value="em"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    value required string

    The new value of the field.

    Content

    This element does not have children.

    <set-character-case>

    Changes the case of the selected characters to the specified case.

    Example

    
            <set-character-case apply-to="country" case="Lowercase" characters="ALL"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    characters required One of ALL, FIRST, TOKEN_INITIALS, TOKEN_INITIALS_NON_ALNUM

    The characters to change.

    case required One of Uppercase, Lowercase, Capitalized

    The character case to apply.

    Content

    This element does not have children.

    <set-null>

    Set the value of the given fields or variables to null.

    Example

    
            <set-null apply-to="zip-code"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    Content

    This element does not have children.

    <switch>

    Switch-like control structure. Executes each case, which meets the condition for that case in sequential order, until the first case succeeds. Changes made to the record in failed cases are rolled back, and have no effect on the data.

    Example

    
            <switch>
              
              <case>
                <block-ref id="phonenr:phone-number.extract-hun-mobile" version="1.0" propagate-failure="true"/>
              </case>
              
              <case>
                <block-ref id="phonenr:phone-number.extract-hun-rural" version="1.0" propagate-failure="true"/>
              </case>
              
              <case>
                <block-ref id="phonenr:phone-number.extract-hun-budapest" version="1.0" propagate-failure="true"/>
              </case>
              
              <case>
                <block-ref id="phonenr:phone-number.set-null" version="1.0" propagate-failure="true"/>
              </case>
            </switch>
            <block-ref id="email:e-mail.multiple.process.skipnull" version="1.0"/>
          

    Attributes

    This element does not have attributes.

    Content


    <switch-strict>

    Strict switch control structure. It provides the same functionality as switch class, but triggers an error event if all cases fail.

    Example

    
            <switch-strict>
              
              <case>
                <block-ref id="phonenr:phone-number.extract-hun-mobile" version="1.0" propagate-failure="true"/>
              </case>
              
              <case>
                <block-ref id="phonenr:phone-number.extract-hun-rural" version="1.0" propagate-failure="true"/>
              </case>
              
              <case>
                <block-ref id="phonenr:phone-number.extract-hun-budapest" version="1.0" propagate-failure="true"/>
              </case>
              
              <case>
                <block-ref id="phonenr:phone-number.set-null" version="1.0" propagate-failure="true"/>
              </case>
            </switch-strict>
          

    Attributes

    This element does not have attributes.

    Content


    <trim>

    Removes whitespace from the beginning and the end of the values of specified fields.

    Example

    
            <trim apply-to="unit"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    Content

    This element does not have children.

    <try-all>

    Try all cases control structure. This control behaves much like a switch structure, but it tries all cases, regardless of which succeeds or not. Unsuccessful block executions are rolled back, and have no effect on the record.

    Example

    
            <try-all>
              <case>
                <check summary="test summary">
                  <not-null apply-to="f1 f2 $v1 $v2"/>	  
                </check>
                <copy apply-to="if1 if2 $if3 $if4" from="if5"/>	
              </case>
              <case>
                <check summary="test summary">
                  <not-null apply-to="f3 f4 $v1 $v2"/>	  
                </check>
                <copy apply-to="if1 i_f2 $if.3 $if4" from="if5"/>	
              </case>
            </try-all>
          

    Attributes

    This element does not have attributes.

    Content


    <unicode-normalize>

    Normalizes a value according the normalization type.

    Example

    
            <unicode-normalize apply-to="$a b c" form="NFD"/>
          

    Abstract atomic block type.

    Attributes

    name content description
    apply-to required List of (\$)?[0-9a-zA-Z_.-]+

    List of fields this element applies to.

    form optional One of NFD, NFC, NFKD, NFKC

    The form of normalization, see http://unicode.org/reports/tr15/#Norm_Forms

    Content

    This element does not have children.

    Other elements

    <block-package>

    Attributes

    This element does not have attributes.

    Content


      <blocks>

      Contains the transformation steps (processing blocks) of the process.

      Attributes

      This element does not have attributes.

      Content


        <constraint-package>

        Attributes

        This element does not have attributes.

        Content


          <entity>

          Generic entity in an external package.

          Example

          
                    <entity id="address.part.public-domain-type.hu" version="1.0">
                      <lookup apply-to="public-domain-type">
                        <external-wordset column-name="word">
                          <wordset-file-source source-file="longneck-jobs/lookup_resources/wordsets/cr_address/domain_type.txt"/>
                        </external-wordset>
                      </lookup>
                    </entity>
                  

          Attributes

          name content description
          id required [0-9a-zA-Z_.-]+
          version optional [0-9a-zA-Z.\-]+

          Content


          <entity-package>

          Attributes

          This element does not have attributes.

          Content


            <error-target>

            Contains the error target element, where data errors and constraint failures are written. The error output records have a special set of fields, which give information about the location and nature of the error that occured.

            The error record is augmented automatically with the following set of fields:

            • class_name: the class name of the constraint;
            • field: the name of the field, which is tested;
            • value: the value of the field above;
            • details: additional information about constraint parameters;
            • document_url: the URL of the process or block file, which was executed;
            • document_row and document_column: the row and column in the above document,
            • check_result: the result of the check, true or false;
            • check_id: a unique identifier attached to the event; consists of a node id, a timestamp and a serial number to distinguish events that occurred within one second;
            • check_parent_id: id of another failure event, which was generated as a direct consequence of the current failure event;
            • check_tree_id: id of the failure tree to allow easy querying of each tree;
            • check_level: the level of the event in the process call tree, starting from zero at the root of the tree, incremented by 1 on each level toward the leaves.

            Attributes

            This element does not have attributes.

            Content


            <field>

            A field of a record.

            Attributes

            name content description
            name required string

            Name of the field.

            value optional string

            Value of the field. If not specified, then value is considered to be null.

            flags optional string

            Content

            This element does not have children.

            <map>

            Creates a mapping between the callee and the called block context for the specified field name. The field in the "from" attribute is accessible under the name in the "to" attribute inside the called block.

            Attributes

            name content description
            from required (\$)?[0-9a-zA-Z_.-]+
            to required (\$)?[0-9a-zA-Z_.-]+

            Content

            This element does not have children.

            <pass>

            Maps the field names specified in the pass element to the called block without change.

            Attributes

            name content description
            fields required List of (\$)?[0-9a-zA-Z_.-]+

            Content

            This element does not have children.

            <postfix>

            Adds a postfix to each specified field name in the called block. For example, if the prefix is "_a" and the mapped field is "b", then the field in the called block is "b_a".

            Attributes

            name content description
            fields required List of (\$)?[0-9a-zA-Z_.-]+
            text required (\$)?[0-9a-zA-Z_.-]+

            Content

            This element does not have children.

            <prefix>

            Adds a prefix to each specified field name in the called block. For example, if the prefix is "a_" and the mapped field is "b", then the field in the called block is "a_b".

            Attributes

            name content description
            fields required List of (\$)?[0-9a-zA-Z_.-]+
            text required (\$)?[0-9a-zA-Z_.-]+

            Content

            This element does not have children.

            <process>

            This is the root element of the process. It contains the source, target and the transformation steps of the process.

            Example

            
                    <?xml version="1.0" encoding="UTF-8"?>
                    <process xmlns="urn:hu.sztaki.ilab.longneck:1.0">
                      <source>
                        <!-- source definition goes here -->
                      </source>
                      <target>
                        <!-- target definition goes here -->
                      </target>
                      <blocks>
                        <!-- transformations... -->
                      </blocks>
                      <test-cases>
                        <!-- test cases... -->
                      </test-cases>
                    </process>
                  

            Attributes

            This element does not have attributes.

            Content


            <record>

            Source record.

            Attributes

            name content description
            role required string

            Specifies the role of record: can be source, target or error-target.

            Content


            <source>

            This element contains the source definition element, which specifies where to read records from.

            Example

            
                    <database-source connection-name="test">
                      <query>
                        select * from dual
                      </query>    
                    </database-source>
                  

            Attributes

            This element does not have attributes.

            Content


            <target>

            Contains a specific target definition element, where the process writes it's output.

            Attributes

            This element does not have attributes.

            Content


            <test>

            Contains a test case of the process.

            Attributes

            name content description
            id required string

            Unique id of the test case.

            summary optional string

            Summary of the test case.

            timeout optional long

            Timeout limit for the process.

            Content


            <test-cases>

            Contains the test cases of the process.

            Attributes

            This element does not have attributes.

            Content


              <unprefix>

              Removes a prefix from each specified field name in the called block. For example, if the prefix is "a_" and the mapped field is "a_b", then the field in the called block is "b".

              Attributes

              name content description
              fields required List of (\$)?[0-9a-zA-Z_.-]+
              text required (\$)?[0-9a-zA-Z_.-]+

              Content

              This element does not have children.