Company News Products Projects Texts Museum Download Contact Map RSS Polski
YAC Software Texts YAC Data Language Handling missing data in filter expressions Trybiks' Dive
  Back

List

Data

Excel

Media research

Market research

Respondent quotas

SPSS

VBA

YAC Data Language

Handling missing data in filter expressions
YAC Interview Builder, as well as YAC DataGate Builder, support testing of a questionnaire / script, that is, the coder can run the script in testing mode and see how questions are displayed, how filters work, etc. That's nothing exceptional in itself, but both programs also support testing parts of a script: from the start to the cursor, from the cursor to the end, or of the selected block. This functionality is very helpful when creating / editing long scripts, where testing a single question without it would require a coder to always test the whole script up to that question.

The functionality seems pretty simple - just pass to the testing program only a part of the questionnaire. However, things start to complicate a bit when the questions in the selected part of the script depend on questions outside of the selected part. For instance, a question should be displayed only when a given response was selected in some previous question (that is outside of the selected part).

Let's take a look at a simplified example (the language used here - YAC Data Language - comes from YAC Interview Kit and YAC Data Kit):
  def question
    id = a;
    text = "First question";
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
  
  def question
    id = b;
    text = "Second question";
    pre = r1 in a;
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
The pre instruction in the second question defines a filter - display this question only when the first response was selected in the first question.

Now, if we try to test the second question only, it would not appear at all - the condition in the pre insruction is not met...

Things get even more complicated if the question being tested depends on responses to two questions: one that was not displayed and one that was. Take a look at the following script:
  def question
    id = a;
    text = "First question";
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
  
  def question
    id = b;
    text = "Second question";
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
  
  def question
    id = c;
    text = "Third question";
    pre = r1 in a and r2 in b;
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
If we're testing here the two questions b and c, how should the pre instruction in question c be handled?

A similar problem appears with YAC Interview Kit's mask instruction:
  def question
    id = a;
    text = "First question";
    type = multi;
    attr = required;
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
    def response id = r3; text = "third response"; end;
  end;
  
  def question
    id = b;
    text = "Second question";
    mask = show a;
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
    def response id = r3; text = "third response"; end;
  end;
Above, question a is defined as a multi-response question. In question b, the mask instruction tells the interving program to display only those responses that were checked in question a. Now, if we're testing the second question only, no responses would be displayed (since none of them were selected in question a).

Although the examples above are pretty simple, they show the basic problem: how should responses to unshown question be handled in filter instructions?

What do we know about questions and responses that were not asked? Basically, nothing - and that's just standard missing data. But we know all responses to displayed questions and these responses should be treated in filter expression without any special processing. Well, 3-valued logic to the rescue!

Consider the three operators used in typical Boolean expression: negation (not), conjunction (and), and disjunction (or). Operations on Boolean values are obvious here, but how should operations on missing data be handled?

negation:
not true = false
not false = true
not missing = missing

In the third line, since we don't know the original value, we can't be expected to know the new value...

conjunction:
true and true = true
false and true = false
missing and true = missing
true and false = false
false and false = false
missing and false = false
true and missing = missing
false and missing = false
missing and missing = missing

missing and true returns missing because depending on the actual value of the first paramter (true or false) the value of the result would be different. Thus, we don't know the resulting value.

On the other hand, missing and false returns false since regardless of the first value, the result of the expression will be false anyway.

disjunction:
true or true = true
false or true = true
missing or true = true
true or false = true
false or false = false
missing or false = missing
true or missing = true
false or missing = missing
missing or missing = missing

As for conjunction, the same argument applies here for missing or true and missing or false.

Now, how do we use this in testing parts of a questionnaire? Assume that an expression is met, when the expression returns true (and not false or missing). Now recall the second example:
  def question
    id = a;
    text = "First question";
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
  
  def question
    id = b;
    text = "Second question";
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
  
  def question
    id = c;
    text = "Third question";
    pre = r1 in a and r2 in b;
    def response id = r1; text = "first response"; end;
    def response id = r2; text = "second response"; end;
  end;
We're testing the last two questions: b and c. Now, if the tester gives the first response to b, then the expression:
r1 in a and r2 in b
will be translated to
missing and false
that gives, according to rules described earlier, false. Thus the second question is not displayed.

On the other hand, if the tester gives the second response to b, then the expression:
r1 in a and r2 in b
will be translated to
missing and true
that gives missing. Thus the second question is displayed (remember that the result missing means that an expression is not met).

So, basically, what we've done here is to "ignore" data from questions that are outside of the selected block for testing. Thanks to this, we can test parts of the questionnaire just like those were independent questionnaires with all expressions based on "outside" questions removed from those expressions.

Top

Comments
Alas!
No comments yet...

Top

Add a comment (fields with an asterisk are required)
Name / nick *
Mail (will remain hidden) *
Your website
Comment (no tags) *
Enter the text displayed below *
 

Top

Tags

YAC Data Language


Related pages

Single- and multi-choice responses (not questions)

Nested randomized blocks

Example of quota checking

Quotas in YAC Interview Kit

noback vs. noret

YAC Data Language