YAC Software - Texts - YAC Data Language - Handling missing data in filter expressions

Company

News

Products

Texts

Texts

Handling missing data in filter expressions

Trybiks' Dive

Back

List

Data

Excel

Media research

Market research

Respondent quotas

SPSS

VBA

YAC Data Language

Handling missing data in filter expressions

YAC Interview Builder, as well as YAC DataGate Builder, support testing of a questionnaire / script, that is, the coder can run the script in testing mode and see how questions are displayed, how filters work, etc. That's nothing exceptional in itself, but both programs also support testing parts of a script: from the start to the cursor, from the cursor to the end, or of the selected block. This functionality is very helpful when creating / editing long scripts, where testing a single question without it would require a coder to always test the whole script up to that question.

The functionality seems pretty simple - just pass to the testing program only a part of the questionnaire. However, things start to complicate a bit when the questions in the selected part of the script depend on questions outside of the selected part. For instance, a question should be displayed only when a given response was selected in some previous question (that is outside of the selected part).

Let's take a look at a simplified example (the language used here - YAC Data Language - comes from YAC Interview Kit and YAC Data Kit):

  def question
    id = a;
    text = "First question";
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
  
  def question
    id = b;
    text = "Second question";
    pre = r1 in a;
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;

The pre instruction in the second question defines a filter - display this question only when the first response was selected in the first question.

Now, if we try to test the second question only, it would not appear at all - the condition in the pre insruction is not met...

Things get even more complicated if the question being tested depends on responses to two questions: one that was not displayed and one that was. Take a look at the following script:

  def question
    id = a;
    text = "First question";
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
  
  def question
    id = b;
    text = "Second question";
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
  
  def question
    id = c;
    text = "Third question";
    pre = r1 in a and r2 in b;
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;

If we're testing here the two questions b and c, how should the pre instruction in question c be handled?

A similar problem appears with YAC Interview Kit's mask instruction:

  def question
    id = a;
    text = "First question";
    type = multi;
    attr = required;
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
    def response id = r3; text = "third response"; end;
  end;
  
  def question
    id = b;
    text = "Second question";
    mask = show a;
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
    def response id = r3; text = "third response"; end;
  end;

Above, question a is defined as a multi-response question. In question b, the mask instruction tells the interving program to display only those responses that were checked in question a. Now, if we're testing the second question only, no responses would be displayed (since none of them were selected in question a).

Although the examples above are pretty simple, they show the basic problem: how should responses to unshown question be handled in filter instructions?

What do we know about questions and responses that were not asked? Basically, nothing - and that's just standard missing data. But we know all responses to displayed questions and these responses should be treated in filter expression without any special processing. Well, 3-valued logic to the rescue!

Consider the three operators used in typical Boolean expression: negation (not), conjunction (and), and disjunction (or). Operations on Boolean values are obvious here, but how should operations on missing data be handled?

negation:

not	true	=	false
not	false	=	true
not	missing	=	missing

In the third line, since we don't know the original value, we can't be expected to know the new value...

conjunction:

true	and	true	=	true
false	and	true	=	false
missing	and	true	=	missing
true	and	false	=	false
false	and	false	=	false
missing	and	false	=	false
true	and	missing	=	missing
false	and	missing	=	false
missing	and	missing	=	missing

missing and true returns missing because depending on the actual value of the first paramter (true or false) the value of the result would be different. Thus, we don't know the resulting value.

On the other hand, missing and false returns false since regardless of the first value, the result of the expression will be false anyway.

disjunction:

true	or	true	=	true
false	or	true	=	true
missing	or	true	=	true
true	or	false	=	true
false	or	false	=	false
missing	or	false	=	missing
true	or	missing	=	true
false	or	missing	=	missing
missing	or	missing	=	missing

As for conjunction, the same argument applies here for missing or true and missing or false.

Now, how do we use this in testing parts of a questionnaire? Assume that an expression is met, when the expression returns true (and not false or missing). Now recall the second example:

  def question
    id = a;
    text = "First question";
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
  
  def question
    id = b;
    text = "Second question";
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
  
  def question
    id = c;
    text = "Third question";
    pre = r1 in a and r2 in b;
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;

We're testing the last two questions: b and c. Now, if the tester gives the first response to b, then the expression:
r1 in a and r2 in b
will be translated to
missing and false
that gives, according to rules described earlier, false. Thus the second question is not displayed.

On the other hand, if the tester gives the second response to b, then the expression:
r1 in a and r2 in b
will be translated to
missing and true
that gives missing. Thus the second question is displayed (remember that the result missing means that an expression is not met).

So, basically, what we've done here is to "ignore" data from questions that are outside of the selected block for testing. Thanks to this, we can test parts of the questionnaire just like those were independent questionnaires with all expressions based on "outside" questions removed from those expressions.

Top

Comments

Alas!
No comments yet...

Top