Company News Products Projects Texts Museum Download Contact Map RSS Polski
YAC Software Texts Data Names of variables in continuous surveys Trybiks' Dive
  Back

List

Data

Excel

Media research

Market research

Respondent quotas

SPSS

VBA

YAC Data Language

Names of variables in continuous surveys
In this text I'd like to talk a little about the management of data coming from continuous surveys. And specifically, about assigning names (identifiers) to variables.

Consider a standard prompted awareness question; assume also, that it's the first question of the questionnaire:

Q1. Which of the following brands have you ever heard of:
  1. Alpha
  2. Beta
  3. Epsilon
  4. Gamma
Using SPSS, for instance, you could code the data as 4 dichotomous variables: Q1_1 to Q1_4.

And nothing wrong with that, it's fast, simple, easy to map to the questionnaire, etc.

As long as this is a single wave survey. :-) As soon as new waves / editions of the survey appear (whether planned or not), this simple schema breaks down in the following situations:
  1. We add / delete / change the order of questions.
  2. We add / delete / change the order of responses.
Although 1) happens less often, 2) is quite common: brands that have very low awareness may be removed, new brands and/or sub-brands may be added (brands that the client is introducing on the market, brands that we forgot to include in the first wave, etc.).

Example 1
Let's add a spontaneous awareness question to our questionnaire for wave 2. Obviously it should go before the questioned above:

Q1. Mention all brands of ... that you know:
  1. Alpha
  2. Beta
  3. Epsilon
  4. Gamma
Now, our prompted awareness question becomes Q2:

Q2. Which of the following brands have you ever heard of:
  1. Alpha
  2. Beta
  3. Epsilon
  4. Gamma
So what happens if we try to merge data from wave 1 and wave 2?

SPSS has this nice functionality allowing for merging data by adding cases. Variables with the same names are automatically paired in the resulting data file. So, as long as we have named the variables accordingly, everything will go nicely and smoothly.

So, what problems do we get here?

In wave 2, Q1 is no longer prompted awareness, but spontaneous awareness. And what's more, Q1 from wave 1 and wave 2 look exactly the same to SPSS (that is, both are dichotomous and have the same suffixes). So these would be automatically merged - not what we want at all.

Well, why rename the question Q1 at all, you may ask. Why not just add the spontaneous question as Q0, for instance?

It is a solution, but a short term one. You will be usually changing something in the questionnaire in each wave. And it's better if each questionnaire is organized according to company standards. If we start naming spontaneous awareness Q0 in one survey, Q1 in another, and so on, we'll soon have one hell of a mess.

Then, a new question comes along that should be inserted between Q0 and Q1 (spontaneous awareness of ads); or before Q0 (top of mind awareness) - what identifiers should we use in these cases?

So, IME and IMO, it's better to renumber questions in each new version of the questionnaire, but keep variable names consistent between waves.

So what should be done in the example above? Let's start naming variables not based on the question's number or on the order of questions in the questionnaire, but on the question's meaning: SPO_1 to SPO_4 for spontaneous awareness, and PRO_1 to PRO_4 for prompted awareness.

Since the meaning of the question doesn't usually change between waves, the above solution works really well when merging multi-wave data.

It also works nicely in data processing scripts - no need to change the script for working on prompted awareness because variables with prompted awareness data will always have the same identifiers.

In the example above, we kept naming variables with numeric suffixes for the different responses. But the same arguments may be used (and are even more valid) for responses.

Example 2
Let's add a response to our prompted awareness question - Delta - in wave 2. For our interviewers, we will keep alphabetical order of responses:

Q1. Which of the following brands have you ever heard of:
  1. Alpha
  2. Beta
  3. Delta
  4. Epsilon
  5. Gamma
Now, the meaning of PRO_3 and PRO_4 changed between waves (PRO_4 became Epsilon, PRO_3 - Delta), causing pretty much the same problems as changes in the names of variables due to adding new questions.

The solution is similar - don't use response codes nor the order of responses when naming variables, use responses' meanings. Then PRO_1 to PRO_5 become:
  • PRO_ALPHA
  • PRO_BETA
  • PRO_DELTA
  • PRO_EPSILON
  • PRO_GAMMA
Now, regardless of how many responses you add, remove, or if you change the ordering of responses, the problems with merging data, processing, and analyzing merged data in continuous surveys become much simpler (or, at least, much more manageable).

For instance, even when you go back to some previous wave and you look at that old questionnaire, the naming standards allow you to quickly find the variable you're looking for. Regardless of the old numbering of questions and responses.

Also, the chance of errors goes down, by making sure that data will be merged only if it really is the same response from the same question.

Though, to be fair, this naming convention is a bit more cumbersome initially. Thus, quite possibly not worth for single-wave, ad-hoc surveys. But how many times had a client returned after a while to run the same survey once again, but with "minor" changes only?

Note also, that this naming convention allows you to look at the variable names only and be pretty sure what questions they come from, and what do the respective responses mean (as long as prefixes and suffixes of variable names are standardized - a must).

All in all, I think this approach is usually worth implementing - saves heaps of time in the long run.

Top

Comments
Alas!
No comments yet...

Top

Add a comment (fields with an asterisk are required)
Name / nick *
Mail (will remain hidden) *
Your website
Comment (no tags) *
Enter the text displayed below *
 

Top

Tags

Data


Related pages

Complex queries in Access :-)