Company News Products Projects Texts Museum Download Contact Map RSS Polski
YAC Software Museum DataGate Suite Script Language Trybiks' Dive
  Back

Information

SPSS-2-Excel

DataGate Suite

Script Language

Reserved words

Script Language
Script structure > File format

A script that defines the structure of the data to be entered must be a plain ASCII text file.

Such files are best prepared in the integrated text editor in the DataGate Studio application. However, you can edit a script in any other editor, which can save a file with no special control characters (such as bold, underline, justification, etc.). Most popular editors have this capability. The DataGate Studio editor, although it can change text formatting, saves its files in plain ASCII.

Line lengths in a script are unlimited. However, for better readability, you should not use lines that do not fit in the screen's width.

Note

DataGate Systems applications are derived from the WD/PD/KD programs. We are trying to make DGS backwards compatible, but some differences between the two languages were introduced. This is described in detail in the
Differences between the languages WD and DGS topic.



Top

Script structure > Instruction format, reserved words

A script consists of a sequence of instructions, of which there are three types: move to next page (clear page), comment and field definition. Instructions are interpreted one after another, in the order they are defined in the file. Also in this order respective pages and fields will be created. If an instruction is not defined correctly, the application will inform about the place and type of the error. The program will try to further interpret the script assuming that the error was not found, so that in one run as many as possible errors will be reported.

The general format of an instruction is a sequence of
reserved words, some with parameters defined after an equals sign. All instructions end with an asterisk *. You can format instructions in any way you like, which means that an instruction can be written on one line or any number of lines, and in one line you can define zero, one or more instructions. Between words, numbers and special characters (e.g. comma, parentheses, colon, equals or asterisk) any number of blanks, tabulations and line ends can be inserted. Below, to show a typical script, is shown a fragment that includes most of the most frequently used elements.

COMMENT='Questionnaire II'*  
COMMENT=''*  
LEFT='Survey' WIDTH=9 TEXT CONST AUTO='OMN200012'  
LEFT='Identifier' WIDTH=3 RECID VARNAME=RECID*  
COMMENT=''*  
LEFT='Respondent - sex' WIDTH=1 RANGE=[1:29AUTO=9  
  VARNAME=Sex VARLAB='Sex' VALLAB=[1 'male' 2 'female']*  
LEFT='Respondent - age' WIDHT=2 RANGE=[15:7599AUTO=99*  
PAGE*  
{  
  If the respondent has no car,   
  the first 8 questions should be skipped:  
}  
LEFT='Owns a car?' WIDTH=1 RANGE=[1:29AUTO=9  
  VARNAME=Car VARLAB='Car ownership' VALLAB=[1 'yes' 2 'no']  
  WHEN=[2][Q1:Q8]*  

In the above example color and format coding is used to identify certain syntactical elements of the text. This has no effect on the interpretation of the script; it is used only for the enhancement of readability. The DataGate Studio editor allows you to define schema like this, some of the other editors also.

The above scheme is define as follows:
bold - reserved words  
green color - strings  
red color - numbers  
italic - notes.  

The type of an instruction and its attributes are determined by
reserved words
. It is insignificant whether reserved words are written in lower-case or upper-case (or mixed).

The order of reserved words in an instruction is arbitrary. A reserved word can appear in an instruction only once (except for
filter definitions). A repeat of a reserved word in usually caused by an omission of an asterisk that ends the previous instruction.



Top

Script structure > Strings

A string is a sequence of characters delimited by apostrophes. A string cannot include an apostrophe and must be written on a single line. A string can have an unlimited length, however very long string will not be displayed completed in the data entry window.

Inside the string, blanks are treated just as any other character, and lower-case and upper-case letters are distinguishable. If the string runs over several lines, an error will be reported.



Top

Script structure > Notes in the script

Fragments of text in the script file between curly brackets are ignored during script interpretation by the application. Thus you can insert arbitrary notes in the script text, for instance to explain why a given question is entered in one way and not another. You can also "turn off" parts of the script - already entered, but for the moment unnecessary (useful for testing the script).

Notes can appear in any place of the script, except inside strings.

Inside a note, the
{ character may appear, but not the
} character. Hence comments may not be nested.



Top

Script structure > Clear page instruction

The simplest instruction is the clear page or move to a new page instruction. Its format is as follows:

PAGE*  

and cannot contain any other reserved words. It forces the data entry editor to end the current page (screen) and start a new one. This functionality can be used to assure coherence between page numbers in the paper questionnaire and page numbers displayed by DataGate Station. It can also be used to decrease the number of fields displayed at a time on the screen (which on the one hand speeds up program execution and on the other hand allows the data entry administrator to plan field positioning in such a way that it will be helpful for keypunchers in the data entry process).

Numbering of pages is automatic and starts with 1. The current page number is displayed on the status bar in DataGate Station.

Note
At least one data field must be present on each page. Hence you should not begin the script with this instruction (the first page is started automatically) or end it thus.



Top

Script structure > Comment instruction

The second type of instruction is the comment instruction, which allows you to display on the data entry screen any given text. For instance, it may inform the keypuncher about the current question or any special coding conditions.

This instruction must include the reserved word
COMMENT, which is followed by an equals sign and a string with the comment text. For instance:

COMMENT='The next question cannot have any missing data'*  

In the simplest case, the comment text is given as a
string (as in the above example). You can also reference another comment instruction (and import its text) by its label. Then, the same comment text will be displayed again. Let's assume that before the currently edited instruction we defined a comment as follows (please note, that this is a definition of a single instruction):

LABEL=Question1  
COMMENT='Enter only marked responses'*  

Then we can display this comment a second time by writing:

COMMENT=^Question1*  

A special kind of comment is an empty comment, designated by two apostrophes:

COMMENT=''*  

This type of comment can be used to create empty lines for better formatting in the data entry window.



Top

Script structure > Labels

A label is a name (symbol) that is assigned to a given instruction. Any instruction may be labeled, except the clear page instruction. Labels are assigned so that the instruction can be referenced by other instructions in the latter part of the script - the instruction referencing other instructions may import certain definitions from them. In many situations it saves time in script preparation. Also, labels are necessary for filter and group definitions.

To assign a label to an instruction, you should add to the instruction the reserved word
LABEL followed by an equals sign and the label's name (which can be any sequence of letters, digits and the underscore character, but beginning with a letter). The program does not distinguish between lower- and upper-case characters (the names
label1 and LABEL1 will be unified).

To reference this instruction in latter ones, you have to reference its label. This is done by placing the
^ character before the label's name (a so called "pointer" which is usually above the digit 6 on the main part of the keyboard).

Because a label uniquely identifies an instruction, a given label identifier may be defined in the script only once. It is also necessary that the labeled instruction be before any other instruction referencing it.

The use of references is explained in the following example. Consider this sequence of instructions:

LABEL=Question1  
COMMENT='Enter only marked codes'*  
LEFT='Code1' WIDTH=2*  
LEFT='Code2'*  
LEFT='Code3'*  
COMMENT=^Question1*  

In the second line a comment has been defined. It is used to inform keypunchers about some special mode of entering data for the following question. To use the same comment text in latter instructions, instead of repeatedly defining it you can reference it. To do this, first you must add a label to the comment instruction (which is done in the first line of the example; this first line, together with the second line, form the definition of the first instruction; this definition ends with the first asterisk).

In the last instruction we decided to use the same comment text. You do not have to define it the second time, just reference the first comment instruction.



Top

Script structure > Data field definition instruction

The third and most complicated type of instruction is the data field definition instruction. It is not depicted by any special reserved word, since the program assumes that all instructions that do not contain the reserved words PAGE and COMMENT describe a data field.

All attributes of a data field are tied with the respective
reserved words. Not all combinations of attributes are legal. If an illegal combination appears in the script text an error is reported.

All fields are defined completely, even if some attribute definitions are omitted. In such a case the program assumes some default value for this attribute under certain rules (these rules are described in the following topics).



Top

Script structure > Data field definition instruction > Field type

The field type is determined by the existence of the reserved word TEXT. If it is included in an instruction, a text field will be defined. Otherwise the program assumes that is should be a numeric field.



Top

Script structure > Data field definition instruction > Left and right field descriptions

The left and right field descriptions are strings that will be shown in the data entry window on the left-hand side and on the right-hand side of the field respectively. These texts are specified in a similar way as comment definitions. The reserved words are: LEFT and RIGHT; after both an equals sign must appear, then a string or a reference. For instance:

LEFT='Question 5'  
RIGHT='Code income in thousands of dollars'*  
LEFT=^Question1*  

If you omit one of the attributes, no text will be displayed by the data field on the respective side.



Top

Script structure > Data field definition instruction > Field width

The field width is defined by the reserved word WIDTH, after which an equals sign must appear, then a number of a reference to a label. Minimum width is 1 character, maximum - 8 characters for numeric fields and 255 for text fields.

It is a bit more complicated to determine the field's width if the attribute is omitted in the definition. In this case, the program assigns the same width as the width of the last field of the same
kind. Thus, width is "inherited" in two independent lines - separately for numeric fields and separately for text fields. If the attribute is omitted for the first defined field (of each kind), a default value of 1 is assumed.



Top

Script structure > Data field definition instruction > Automatic field value

General format of this definition is as follows: the reserved word AUTO, an equals sign, then a number, a string or a reference to a previous instruction. For instance:

AUTO=99  
AUTO='99'  
AUTO='not applicable'  
AUTO=^Q1  

When declaring an automatic value with a number (e.g.
AUTO=99), you should only assure that the number (together with an optional minus sign) would fit in the field's width. Leading zeros or a plus character are insignificant. For instance 333, -00022, +000345, 0765 are all three-character numbers. For numbers that have fewer characters than the field's width, the data entry program will automatically add leading zeros to fill the whole field. Hence, during data entry, the automatic value 2 (AUTO=2) will be written as 2 in a field that has a width of 1, and 02 in a two-character field, etc.

The automatic value for a numeric field can also be defined by a string. A string has a given number of characters, which in this case must exactly meet the field's width. For instance, for a two-character field instead of writing
AUTO=99 we can use the definition AUTO='99'. Of course, when defining an automatic value in this way, the string must represent a legal integer value. For instance the declaration AUTO='$0' is illegal for numeric fields.

The above-mentioned declaration of numeric automatic values by strings has a practical meaning only in one case - when using a fill (backslash) character. It can be placed only as the first character of a two-character string. Then, the field is filled out with the second symbol. For instance the declaration
AUTO='\9' is legal for any numeric field (that is for a field of any width). It fills out the whole field with nines. If the width is 1, then a 9 is written; if the width is 2 then 99 is written, etc.

When using the fill character, you can write any character in the second position. This allows you, for instance, to define an automatic value consisting of dots only.

In the case of text fields the constraints are a bit higher. Because the text will be saved in exactly the way it is written in the instruction, its length must be exactly that of the field's width. This means that you have to expand the text to the given length, usually by inserting leading or trailing blanks. For instance, if a text field has a width of 24, and we want to insert automatically the text "not applicable", the declaration should look as follows:

AUTO='not applicable          '  

(the text „not applicable" plus 10 blanks). The declaration

AUTO='not applicable'  

is incorrect - the program will report an error when checking the script.

If the automatic value is defined by a reference to a previous instruction (
AUTO=^labelname), then the following rule applies: the copied value has a text representation, no matter what was the way it was originally defined (in the field labeled labelname). The definitions of the third and fourth fields in the example below are incorrect, since in both cases the copied values have different widths than the field (the third instruction copies the value 001, the fourth instruction 99999).

WIDTH=3 AUTO=1 LABEL=first*  
WIDTH=5 AUTO='\9' LABEL=second*  
WIDTH=1 AUTO=^first*  
WIDTH=2 AUTO=^second*  

When the
AUTO definition is omitted, the inheritance rule applies, similar to the one described for the definition of the field's width. This time, however, not only the value but also its definition is inherited. The following example illustrates this:

{ 1} WIDTH=2 AUTO=9*  
{ 2} WIDTH=4 LABEL=AA*  
{ 3} WIDTH=3*  
{ 4} WIDTH=2 AUTO='\9'*  
{ 5} WIDTH=5*  
{ 6} WIDTH=3 AUTO='888'*  
{ 7} *  
{ 8} WIDTH=5*  
{ 9} WIDTH=4 AUTO=^AA*  
{10} WIDTH=2*  

Line 1 - definition by a number - automatic value
'09'.
Line 2 - inherited definition by a number - automatic value
'0009'.
Line 3 - inherited definition by a number - automatic value
'009'.
Line 4 - a "fill" (backslash) definition - automatic value
'99'.
Line 5 - inherited "fill" definition - automatic value
'99999'.
Line 6 - definition by a string - automatic value
'888'.
Line 7 - inherited definition by a string - automatic value
'888'.
Line 8 - inherited definition by a string - error: incorrect length of the automatic value.
Line 9 - definition by reference - automatic value
'0009'.
Line 10 - inherited definition by a string - error: incorrect length of the automatic value.

Based on this, definitions by a number and by a backslash should be considered as the more elastic ones. They allow you to skip the definition of an automatic value for each field, if in a larger part of the script these values are the same integer values or the field is filled out with the same character. This mechanism works also when the fields in such a fragment have varying widths.

As in the case of the field's width, automatic values are inherited from previous fields in two lines - separately for numeric fields and separately for text fields. The default automatic values are as follows:
AUTO=0 for numeric fields and AUTO='\ ' (backslash, blank) for text fields.

Note
Inheritance works even if automatic values were not explicitly defined. This may be important when fields have varying widths.



Top

Script structure > Data field definition instruction > Range of admissible field values

You can limit the range of admissible field values for numeric fields only. The definition is as follows: the reserved word RANGE, an equals sign and a range description between brackets or a reference to an earlier instruction. For instance, a range definition might look like this:

RANGE=[0:2230:40919397:99]*  
RANGE=^Question1*  

In brackets you can place single values or value ranges separated by commas. A range is defined by the minimum value, a colon, and a maximum value. The order of values and ranges in the specification is insignificant.

A special meaning has the empty range (depicted by
[ ]). Its use tells the program to turn off any range control for the fields it concerns. In such fields you can enter any integer value.

If you define the range by a reference (
RANGE=^Question1) then the field will have the same range of admissible values as the references field.

If no range is specified for the field, it will have the same range as the last numeric field before this one (the inheritance rule). The first numeric field in the script has a default empty range (no control).



Top

Script structure > Data field definition instruction > Data record identifier

One field in the script can be depicted as the data record identifier. This field will be filled out with consecutive integers for consecutive records (starting with 1). To define a field as the data record identifier add the reserved word RECID to the instruction that defines this field.

If an
automatic value definition will be added to a field with the RECID attribute, the numbering will start with that value.

Note
The data record identifier field cannot be a
constant field.



Top

Script structure > Data field definition instruction > Constant field

If in the field definition the reserved word CONST appears, the field will always be filled out automatically with its automatic value - it will be treated as a constant field.

Note
The
data record identifier cannot be a constant field.



Top

Script structure > Data field definition instruction > Field groups

To define a group of fields, which can be filled out with automatic values with one press of a button (numeric plus), you should add the following instruction in the first field of the group:

GROUP=<labelname>  

and in the last field:

LABEL=<labelname>  

All fields between those two fields (inclusive) will belong to the group.

From this definition we have that other fields cannot separate fields, which make up a group. However, comment and clear page instructions may be included.

The label name used after the reserved word
GROUP could not have been used previously (this is not a reference). It cannot be assigned to a page or comment instruction, but it must start and end with data field definition instructions.

Also, groups cannot overlap - it is illegal to start a new group before another groups ha ended.

Note
The
data record identifier cannot be part of any group.



Top

Script structure > Data field definition instruction > Filters

Every numeric field can define a filter. Depending on the value entered in such a field, some latter fields may be skipped - they will be filled out with their automatic values.

In a filter instruction the following definition must appear: the reserved word
WHEN, an equals sign, a value range description and a field range description. For instance:

WHEN=[04:6][q16, q17, q20:q22]  

The value range defines the filter's active values. Its format is identical to the definition of the
range of admissible values, i.e. values and values ranges separated by commas and delimited by brackets. However, the empty range is illegal here.

The field range defines fields that depend on the preceding active values. These fields will be filled out automatically if one of the active values will be entered in the field. The formal format of this definition is similar to the value range definition, but values are replaced by label names. It starts with an opening bracket, can consist of single labels separated by commas. Field ranges are also allowed, that is elements of the form:

labelname1:labelname2  

The definition ends with a closing bracket.

A field may have many filters, each with its own range of active values and dependant fields. This means that the definition
WHEN=[value range][field range] may appear many times in a single data field instruction. Thus it is legal to write:

WHEN=[1:49][q5, q6] WHEN=[5][q6:q8]*  

Please note that the field labeled
q6 will be automatically filled out in both cases.

There are no special rules that govern the specification of field ranges in one filter or many filters. A field dependant on a filter (or filters) will be automatically filled out if any of those filters will have an active value entered. The following example illustrates this:

{1} WHEN=[0][q11]*  
{2} WHEN=[0][q11, q12]*  
{3} LABEL=q11 AUTO=9*  
{4} LABEL=q12 WIDTH=2 AUTO=99*  

The field in the third line will be filled out automatically if the field {1}
or the field {2} will have a value of 0. The field {4} will be automatically filled out only when the field {2} will have the value of 0.

The above describes a strong version of a filter. However, there is also a weak version of a filter that is defined in the same way, except that for the reserved word
WHEN a new reserved word SIGNAL is used. Let us look at an example similar to the previous one:

{1} SIGNAL=[0][q11]*  
{2} SIGNAL=[0][q11, q12]*  
{3} LABEL=q11 AUTO=9*  
{4} LABEL=q12 WIDTH=2 AUTO=99*  

The field in the third line will be filled out automatically if the field {1} and
at the same time the field {2} will both have a value of 0. The field {4} will be automatically filled out only when the field {2} will have the value of 0, so in this case a weak filter works just like the strong version. Hence, a field dependant on a weak filter (or filters) will be automatically filled out if all those filters have an active value entered. If a field is under a single weak filter (e.g. field {4} in the example), then this filter works the same way as its strong counterpart.

A field can simultaneously depend on many strong and weak filters. In such a case strong filters have priority: a field will be filled out if any of the strong filters have an active value entered. If no strong filter has an active value entered, then the field will be automatically filled out if all weak filters have an active value entered.



Top

Script structure > Data field definition instruction > Simultaneous definition of several fields

To save time on script preparation, you can define a sequence of fields with just one instruction (under the condition that those fields will have the same attributes). This can be used to enter multiple-choice questions, for which several fields have the same width, the same range and a similar description. It should be noted that this has nothing to do with field groups defined by the reserved word GROUP.

To use this mechanism, you have to define the first field and add to it the reserved word
REPEAT, then an equals sign and the number of repeats. The application will generate the given number of data fields (from 1 to 999). All these fields will have the same type, width, range, automatic value and right description. All fields will be placed on the same page of the questionnaire.

To differentiate between these fields, you can modify their left descriptions. If in the left description of the field you enter a dollar sign
$, then this sign will be substituted for consecutive fields with consecutive integers. The numbering starts with 1, but you can change this with the ITEM directive. After ITEM write an equals sign and the number REPEAT should start from. For example, the following definitions:

LEFT='Question 1.$' WIDTH=1 AUTO=0 RANGE=[0:2REPEAT=6*  
PAGE*  
LEFT='Question 1.$' REPEAT=4 ITEM=7*  

will generate 10 fields with left descriptions from "Question 1.1" to "Question 1.10". First 6 fields will be on one page, the other fields on the next page.

In a simultaneous definition the reserved words
WHEN and RECID cannot be used. If in the definition a CONST attribute is used then all fields will be constant.

If you add a label to the instruction that has a
REPEAT attribute, this label will be assigned to the last field defined by this instruction. This allows you to define a group containing all fields of the simultaneous definition. Thus, the instruction

LEFT='Question 1.$' REPEAT=6 LABEL=quest1 GROUP=quest1*  

generates 6 fields, which form a group.



Top

Script structure > Data field definition instruction > Sound signaling

Some fields of the data record can be distinguished by emitting a short sound when the cursor moves to one of the fields. This sound is different from the sound marking an entry error and is used to warn the keypuncher. This is helpful for people who enter data without looking at the keyboard or the screen. This can be used to signal, for instance, new pages in the questionnaire or some atypical manner of entering data. The signal is generated only for fields that have not been filled out yet (when checking and correcting data the sound is not emitted). You can turn this option on or off via the Options | Sound menu. To set a sound signal for a field, add to it the reserved word BEEP.



Top

Script structure > Data field definition instruction > Fields designated for update

Fields, which will be entered separately in update mode, should be marked with the reserved word UPDATE.



Top

Script structure > Export of definitions to SPSS

Basing on the DataGate Systems script you can automatically create a syntax file for loading the data into SPSS (the DATA LIST instruction). In such a script you should probably define variable names, variable labels and value labels.



Top

Script structure > Export of definitions to SPSS > Variable name

The attribute VARNAME is used to name the variable created for this field. The format of this attribute is as follows: VARNAME, an equals sign, then an identifier. For example:

VARNAME=Q1_1  

The identifier must be no longer than 8 characters and must begin with a letter.

If the attribute
VARNAME is not defined, the program will try to create a name based on the field's left description in the following way: the left description is truncated to the first blank (exclusive) and limited to 8 characters.

Because the variable's name uniquely identifies a field, it cannot be repeated in the whole script. In such a case a warning is reported.



Top

Script structure > Export of definitions to SPSS > Variable label

The attribute VARLAB is used to define the variable's label. The format of this attribute is as follows: VARLAB, an equals sign, then a string or a reference. For example:

VARLAB='Brand 1' LABEL=Brand1*  
VARLAB='Brand 2' LABEL=Brand2*  
. . .  
VARLAB=^Brand1*  
VARLAB=^Brand2*  

Note

If the
VARLAB attribute is not defined, the field's left description will be used as the variable's label.



Top

Script structure > Export of definitions to SPSS > Value labels

The attribute VALLAB is used to define labels for the variable's values. The format of this attribute is as follows: VALLAB, an equals sign, then the description of labels. This description consists of an opening bracket, a list of values and strings separated by blanks, and a closing bracket. In the place of the description of labels a reference to a previous field may be used. For example:

VALLAB=[1 'very good' 5 'very bad'LABEL=eval1*  
VALLAB=^eval1*  

Note

This list may be empty. In that case the variable will have no value labels defined.



Top

Script structure > Differences between the languages WD and DGS

File format

Maximum line length in WD is 255 characters. In DGS there is no upper limit.

Script interpretation

If the script is incorrect, WD stops analysis on the first error. DGS tries to find (and report) as many errors as possible.

WD is based on the Polish version of
reserved words. There are two versions of DGS: with Polish reserved words (discontinued, only for backward compatibility) and with English reserved words.

Being a DOS program, WD automatically divides the script into
pages 21 lines in length. On a DGS page you can have any number of fields.

In WD
comments can be 76 characters long, left field descriptions 20 characters and right field description 49 characters. In DGS there are not limits on the lengths of the above definitions.

The maximum width of a
text field in WD is 48 characters. In DGS - 255 characters.

In WD
labels cannot include underscores. This is allowed in DGS.

DGS has better control of
ranges of admissible values: ranges cannot overlap, range limits must be defined in the correct order, values given in range definitions must fit in the field width.

DGS does not allow the
automatic value to be outside of the range of admissible values.

DGS allows for special automatic values (non-numeric strings) in numeric fields.

DGS checks the active
filter values - they cannot be outside the admissible range and cannot totally cover it.

DGS reports additional warnings: unused
labels, no left field description, variable name cannot be created based on the left description, variable name repeated.



Top

Topics

File format

Instruction format, reserved words

Strings

Notes in the script

Clear page instruction

Comment instruction

Labels

Data field definition instruction

Field type

Left and right field descriptions

Field width

Automatic field value

Range of admissible field values

Data record identifier

Constant fields

Field groups

Filters

Simultaneous definition of several fields

Sound signaling

Fields designated for update

Export of definitions to SPSS

Variable name

Variable label

Value labels

Differences between the languages WD and DGS