Read Multiple Variables Create New Variable if Not Blank Spss

ane. Introduction

This module will explore missing data in SPSS, focusing on numeric missing information. We will depict how to betoken missing data in your raw data files, how missing data are handled in SPSS procedures, and how to handle missing data in a SPSS information transformations. At that place are two types of missing values in SPSS: i) system-missing values, and 2) user-defined missing values. Nosotros volition demonstrate reading data containing each kind of missing value. Both information sets are identical except for the coding of the missing values. For both information sets, suppose we did a reaction time written report with 6 subjects, and the subjects reaction time was measured 3 times.

2. System-missing values

System-missing

values are values automatically recognized as missing past SPSS. You might notice that some of the reaction times are left bare in the information below. That is the accepted way of indicating system missing data in the data prepare. For example, for subject 2, the second trial is blank. The only way to read raw data with fields left blank is with stock-still field input. The values left bare automatically are treated every bit arrangement-missing values.

Note:

It is possible to hold the missing place with a single dot in the field, merely if you exercise you will get a warning message each time SPSS encounters one of these values. The resulting variable is coded with organisation-missing values.

                                  DATA Listing Fixed/   id 1 trial1 3-5 (1) trial2 6-8 (1)         trial3 11-13 (1).  Brainstorm DATA . 1 1.5 1.4 1.six  2 1.5     1.9  3     2.0 1.6  4         ii.2  5 two.1 2.3 two.2 6 1.viii 2.0 ane.nine END Information .  LIST .

Ane reason for missing data might exist that the equipment failed for that trial was missing. The result of the list follows, notice that SPSS marks system-missing values with a dot in the listing. There is a dot everywhere in the list that at that place was a blank in the information.

                                ID TRIAL1 TRIAL2 TRIAL3   ane   i.5    1.0    ane.6  ii   ane.v     .     one.9  3    .     2.0    1.6  4    .      .     two.2  five   2.1    ii.0    2.ii  6   1.eight    ii.0    1.ix  Number of cases read:  half dozen    Number of cases listed:

three. User-divers missing values

User-divers

missing

values are numeric values that demand to be defined as missing for SPSS. You might notice that some of the reaction times are -9 in the data below. You may use any value you choose to correspond a missing value, but exist conscientious that you lot don't cull a value for missing that already exists for the variable in the data fix. For that reason many people choose negative numbers or large numbers to represent missing values. For example, for subject 2, the second trial is -nine. You may read raw data with user-missing values either equally fixed field input or as costless field input. Nosotros volition read it as complimentary field input in this instance. When defined as such on a missing values control these values of -nine are treated every bit user-missing values.

                                  Data LIST FREE/       id trial1 trial2 trial3 . MISSING VALUES trial1 TO trial3 (-9). COMPUTE  trialr1=trial1. COMPUTE  trialr2=trial2. COMPUTE  trialr3=trial3. VARIABLE LABELS trial1 "Trial ane User Miss"                    trialr1 "Trial one Sys Miss".   BEGIN Information .  i 1.5 1.four 1.half dozen  2 1.5  -9 1.nine  three  -nine 2.0 1.half-dozen  4  -9  -9 two.2  5 2.one 2.3 2.2 half dozen 1.viii ii.0 ane.9 END Data .  List .

The compute control is used to create the new variables trialr1 through trialr3, which will incorporate arrangement-missing values where there were user-defined missing values in the original variables. User-defined missing values on the original variable become organisation-missing values on the new variables. The result of the listing follows, notice that SPSS marks user-missing values with a -nine in the listing. In that location is a -9 everywhere in the list that at that place was a -ix in the data, and so the value of the user-defined missing is preserved for the original variables ().

                                                ID   TRIAL1   TRIAL2   TRIAL3  TRIALR1  TRIALR2  TRIALR3  ane.00     1.fifty     i.twoscore     1.60     1.l     1.40     ane.60 2.00     1.50    -9.00     1.90     1.l      .       ane.xc 3.00    -ix.00     2.00     1.60      .       2.00     ane.60 4.00    -9.00    -nine.00     2.20      .        .       2.20 5.00     2.10     2.30     ii.xx     two.x     2.thirty     2.20 6.00     i.80     2.00     i.90     i.80     2.00     one.xc  Number of cases read:  half-dozen    Number of cases listed:  half-dozen

Let's examine how SPSS handles missing data in assay commands.

4. How SPSS handles missing data in analysis commands

As a full general dominion, SPSS assay commands that perform computations handle missing data by omitting the missing values. (We say assay commands to bespeak that we are non addressing commands like sort.) The mode that missing values are eliminated is non always the aforementioned among SPSS commands, so allow's us look at some examples. Offset, use the descriptives command on our data file and run across how this control handles the missing values.

                DESC   /VAR= trial1 trial2 trial3          trialr1 trialr2 trialr3.

Equally you see in the output beneath, descriptives computed the ways using four observations for trial1 and trial2 and six observations for trial3. In short, descriptives used all of the valid data and performed the computations on all of the bachelor data. This was also truthful for the adjacent 3 variables containing user-missing values.

As you run across below, frequencies too performed its computations using just the available data. Note that the percentages are computed based on merely the total number of non-missing cases. But the missing values practice announced in the tables and they are marked equally missing. This is true for both types of missing values.

                FREQ    /VAR= trial1 trialr1 .

It is possible that you might want the valid percentages to exist computed on the total number of values, and even written report the pct missing in the table itself. Y'all can asking this using the missing=include subcommand on the freq command. This is shown beneath for trial1 and trialr1.

                FREQ    /VAR= trial1 trialr1     /MISSING= INCLUDE.

As you lot see, now the valid percentages are computed out of the total number of observations, and the per centum missing are shown right in the table as well for the variable trial1 which contained user-missing values. For trialr1, the organisation-missing values are non used to compute percents even with missing=include specified.

The crosstabs command but includes valid (not-missing data) in its tables. Cases containing a missing value for even i of the variables are not included in the table. Annotation that the percentages are computed based on just the non-missing cases. This is true for both types of missing values.

                CROSS    /TAB= trial1 By trial2 / trialr1 BY trialr2.

Information technology is possible that yous might desire the missing values included in the tables. This is especially true when you are using crosstabs to verify your transformations. You lot can request this using the missing=include subcommand on the crosstabs command. This is shown below for trial1 and trialr1. Here again, yous will but exist successful for user-missing values.

                Cantankerous    /TAB= trial1 Past trial2 / trialr1 BY trialr2   /MISSING= INCLUDE.

The user-missing values are included in the tabular array for the variable trial1. For trialr1, the system-missing values are not included in the table even with missing=include specified. There is no subcommand that will enable the inclusion of system-missing values in the crosstabs table.

There is no manner to get a system missing value to announced in a crosstabs table. The closest you lot will come is to change the system-missing value to a user-missing value. This can exist accomplished with a recode command, as is shown below. The keyword sysmis can exist used on the recode command, and it stands for the system-missing value.

                RECODE trialr1 trialr2             (SYSMIS=-1) (ELSE=COPY)             INTO trialb1 trialb2 . MISSING VALUES trialb1 trialb2 (-1).  CROSS    /TAB= trialb1 BY trialb2    /MISSING= INCLUDE.

Allow'due south look at how corr handles missing information. We would expect that it would do the computations based on the available data, and omit the missing values for each pair of variables. Considering 2 variables are necessary to compute each correlation. Here is an instance programme.

                CORR   VAR trial1 trial2 trial3 .

The output of this command is shown beneath. Note how the missing values were excluded. For each pair of variables, corr used the number of pairs that had valid information. For the pair formed by trial1 and trial2, there were three pairs with valid data. For the pairing of trial1 and trial3 there were four valid pairs, and likewise there were four valid pairs for trial2 and trial3. Since this used all of the valid pairs of information, this is oftentimes chosen pairwise deletion of missing data.

It is possible to specify that the correlations run only on observations that had complete data for all of the variables listed on the var subcommand. You might want the correlations of the reaction times only for the observations that had not-missing data on all of the trials. This is called listwise deletion of missing information meaning that when any of the variables are missing, the entire observation is omitted from the analysis. You can asking listwise deletion within corr with the mssing=listwise subcommand, every bit shown in the example below.

                CORR   /VAR= trialr1 trialr2 trialr3   /MISSING=LISTWISE.

As y'all come across in the results below, the N for all the simple statistics is the same, 3, which corresponds to the number of cases with consummate non-missing data for trial1, trial2 and trial3. Since the N is the same for all of the correlations (i.e., 3), the Northward is not displayed forth with the correlations in SPSS 7.5 and higher.

5. Summary of how missing values are handled in SPSS assay commands

It is important to understand how SPSS commands used to analyze data treat missing data. To know how any i command handles missing data, y'all should consult the SPSS transmission. Here is a brief overview of how some common SPSS procedures handle missing data.

half dozen. Missing values in assignment expressions

An consignment expression may appear on a compute or an if command. It is of import to understand how missing values are handled in assignment statements. Consider the example shown below.

                                  COMPUTE  avg = (trial1 + trial2 + trial3) / 3 . COMPUTE  avgr = (trialr1 + trialr2 + trialr3) / three .  LIST    /VAR = trial1 TO trial3 avg. Listing    /VAR = trialr1 TO trialr3 avgr.

The listing below illustrates how missing values are handled in assignment statements. The variable avg is based on the variables trial1 trial2 and trial3, and the variable avgr is based on the variables trialr1 trialr2 and trialr3. If any of the component variables were missing, the value for avg or avgr was set up to missing. This means that both were missing for observations 2, 3 and 4.

                                TRIAL1   TRIAL2   TRIAL3      AVG      1.fifty     one.xl     i.60     1.l     1.50    -9.00     i.90      .    -9.00     2.00     ane.lx      .    -9.00    -9.00     2.twenty      .     2.10     ii.thirty     ii.20     2.xx     i.fourscore     2.00     1.90     i.90  Number of cases read:  half dozen    Number of cases listed:  vi   TRIALR1  TRIALR2  TRIALR3     AVGR      1.fifty     ane.twoscore     1.60     1.50     1.fifty      .       1.90      .      .       two.00     1.lx      .      .        .       2.twenty      .     two.10     2.thirty     2.twenty     2.xx     1.80     2.00     1.90     1.90  Number of cases read:  6    Number of cases listed:  vi

Both system-missing and user-defined missing values yield the same results.

As a general rule, computations involving missing values yield missing values, as shown below.

Whenever you add, decrease, multiply dissever etc., values that involve missing information, the result is usually system-missing. An exception is a value that is divers regardless of one of the values, for example nothing divided by missing is zero.

In our reaction time experiment, the boilerplate reaction time avg is missing for thee out of six cases. We could try just averaging the data for the not-missing trials past using the hateful function every bit shown in the example below.

                COMPUTE  avg = Hateful(trial1, trial2, trial3) . COMPUTE  avgr = Mean(trialr1, trialr2, trialr3) .  Listing    /VAR = trial1 TO trial3 avg.  LIST    /VAR = trialr1 TO trialr3 avgr.

The results below show that avg now contains the average of the not-missing trials, even if at that place is simply i.

                                TRIAL1   TRIAL2   TRIAL3      AVG      1.l     one.40     1.60     1.50     1.50    -9.00     1.ninety     1.70    -nine.00     2.00     ane.60     1.80    -9.00    -9.00     2.20     2.twenty     2.x     2.30     2.20     2.20     ane.fourscore     two.00     1.90     ane.90  Number of cases read:  6    Number of cases listed:  6   TRIALR1  TRIALR2  TRIALR3     AVGR      1.l     one.40     i.sixty     one.50     1.50      .       1.xc     1.70      .       2.00     one.lx     ane.80      .        .       two.20     two.20     2.ten     2.xxx     ii.20     ii.twenty     one.80     2.00     1.xc     1.90  Number of cases read:  six    Number of cases listed:  vi

Had there been a large number of trials, say 50 trials, then information technology would be annoying to have to type
avg = mean(trial1, trial2, trial3 …. trial50)
Hither is a shortcut you could employ in this kind of state of affairs
avg = hateful(trial1 to trial50)
providing that the trial variables are contiguous in the file.

Besides, if we wanted to get the sum of the times instead of the average, then we could just employ the sum function instead of the hateful function. The syntax of the sum function is just similar the mean office, but it returns the sum of the not-missing values.

Finally, you can utilize the nvalid function to determine the number of non-missing values in a list of variables, as illustrated below.

                COMPUTE  n = NVALID(trial1, trial2, trial3).  LIST    /VAR = trial1 TO trial3 avg due north.

Every bit you lot see beneath, observations ane, 5 and 6 had iii valid values, observations two and 3 had two valid values, and ascertainment 4 had but one valid value. These results are the aforementioned regardless of the blazon of missing value.

                                TRIAL1   TRIAL2   TRIAL3      AVG        N      i.50     one.40     one.lx     1.l     iii.00     1.fifty    -9.00     ane.ninety     1.70     2.00    -9.00     ii.00     one.lx     1.eighty     two.00    -9.00    -9.00     ii.twenty     2.twenty     1.00     2.x     ii.30     2.20     two.20     3.00     i.80     two.00     1.90     1.90     3.00  Number of cases read:  6    Number of cases listed:  vi

You might experience uncomfortable with the variable avg for observation four since it is not really an average at all. We can utilise the mean.northward class of the role to control the number of valid values required to compute a mean.

                COMPUTE avg = MEAN.2(trial1 TO trial3).  LIST    /VAR = trial1 TO trial3 avg n.

The hateful.2 function requires at least two valid values for a hateful to be calculated. In the output beneath, you run across that avg now contains the average reaction time for the non-missing values, except for observation four where the value is assigned to missing considering information technology had only one valid ascertainment.

                                TRIAL1   TRIAL2   TRIAL3      AVG        Due north      1.fifty     one.40     1.60     1.l     three.00     1.50    -9.00     1.90     1.70     2.00    -9.00     2.00     1.lx     one.eighty     2.00    -9.00    -9.00     ii.xx      .       1.00     2.10     2.30     2.20     ii.xx     iii.00     i.80     2.00     1.90     1.90     3.00

seven. Missing values in recoding commands

Using IF.

Suppose you wanted to create a dummy variable from trial1 with a cutpoint of two. We can use the if command to create the variable hit1. The same is true for creating hirt1 from trialr1.

                IF trial1  > two    hit1 = 1. IF trial1 <= two    hit1 = 0.  IF trialr1  > two    hirt1 = 1. IF trialr1 <= 2    hirt1 = 0.  VARIABLE LABELS hit1  "Tran T1 User Miss"                 hirt1 "Tran T1 Sys Miss".  FREQ    /VAR= hit1 hirt1.

The frequencies shows the outcome of these transformations as they touch on the missing values. Both system-missing and user-defined missing values result in correct classification.

Now, suppose you wanted to create a dummy variable from trial1 in combination with trial2 with a cutpoint of two for each. We can apply the if command to create the variable hit12. The same is true for creating hirt12 from trialr1 and trialr2.

                IF trial1  > 2  and trial2>two       hit12 = 1. IF non(trial1  > 2  and trial2>2)  hit12 = 0.  VARIABLE LABELS hit12  "Tran T1 AND T2".  FREQ    /VAR= hit12 .   List     /VAR = trial1 trial2 hit12.

The frequencies and listing shows the consequence of these transformations as they touch on the missing values. Both arrangement-missing and user-defined missing values result in the same output, and then but the output for user-defined missing values will be shown.

                                TRIAL1   TRIAL2    HIT12      ane.fifty     1.40      .00     1.l    -nine.00      .00    -9.00     two.00      .00    -9.00    -ix.00      .     2.10     2.thirty     ane.00     1.80     ii.00      .00

At that place is but i missing value in the created variable hit12, merely we know that there are at to the lowest degree 2 missing values for trial1 alone. If SPSS tin can resolve the logic based on a unmarried variable, then it will. Since not(trial1 > 2 and trial2>ii) is truthful if either of the atmospheric condition is false, this can be resolved. This is the result that most people would prefer.

If you prefer to have the consequence missing if either of the component variables are missing then that can be accomplished past adding the post-obit if command. As is shown by the results of the frequencies and list commands.

                IF MISSING(trial1) OR MISSING(trial2)            hit12 = $SYSMIS.  FREQ    /VAR= hit12 .   LIST     /VAR = trial1 trial2 hit12.

Annotation that the missing function is evaluated as true if the variable in the argument contains any kind of missing value. If you lot are exclusively concerned with system-missing values you may want to use the sysmis function. Consider the following results. Any missing value for i of the component variables results in a missing for hit12 .

                                TRIAL1   TRIAL2    HIT12      ane.50     1.40      .00     one.50    -9.00      .    -nine.00     2.00      .    -9.00    -ix.00      .     2.10     2.30     1.00     1.80     ii.00      .00

Using RECODE

The recode control can be used to accomplish the dummy coding task discussed at the outset of the department. In one case again, suppose yous wanted to create a dummy variable from trial1 with a cutpoint of 2. Nosotros can use the recode control to create the variable hit1. The same is truthful for creating hirt1 from trialr1. Yet, this command functions differently with respect to arrangement-missing and user-defined missing values.

                RECODE trial1  (LO THRU two=0)                  (ii THRU HI=one) INTO  hi2t1.  RECODE trialr1 (LO THRU 2=0)                  (two THRU Howdy=i) INTO  hi2rt1.  VARIABLE LABELS hi2t1  "Tran T1 User Miss"                 hi2rt1 "Tran T1 Sys Miss".  FREQ    /VAR= hi2t1 hi2rt1.

The frequencies shows the result of these transformations every bit they affect the missing values. The answer is correct with respect to system-missing values and incorrect with respect to user-missing values. The user-defined missing values are classified according to their value, as if they were not missing.

At present we can examine recode with the else keyword. This affects both system-missing and user-defined missing values the aforementioned, but unfortunately neither are right. The else keyword will include both types of missing values, and mis-classify them.

                RECODE trial1  (LO THRU 2=0)                  (ELSE=1) INTO  hi3t1.  RECODE trialr1 (LO THRU two=0)                  (ELSE=1) INTO  hi3rt1.  VARIABLE LABELS hi3t1  "Tran T1 User Miss"                 hi3rt1 "Tran T1 Sys Miss".  FREQ    /VAR= hi3t1 hi3rt1.

The frequencies result follows.

If we add the (missing=sysmis) to the recode the trouble is alleviated for system-missing , just non for user-divers missing values.

                RECODE trial1  (LO THRU 2=0) (MISSING=SYSMIS)                 (ELSE=1) INTO  hi4t1.  RECODE trialr1 (LO THRU 2=0) (MISSING=SYSMIS)                (ELSE=1) INTO  hi4rt1.  VARIABLE LABELS hi4t1  "Tran T1 User Miss"                 hi4rt1 "Tran T1 Sys Miss".  FREQ    /VAR= hi4t1 hi4rt1.

The frequencies result follows.

Changing the gild of (missing=sysmis) and (lo thru two=0) alleviates the trouble for user-divers missing too.

                RECODE trial1  (MISSING=SYSMIS) (LO THRU 2=0)                 (ELSE=1) INTO  hi5t1.  RECODE trialr1 (MISSING=SYSMIS) (LO THRU 2=0)                 (ELSE=1) INTO  hi5rt1.  VARIABLE LABELS hi5t1  "Tran T1 User Miss"                 hi5rt1 "Tran T1 Sys Miss".  FREQ    /VAR= hi5t1 hi5rt1.

The frequencies issue follows.

8. Issues to await out for

When creating or recoding variables, it is always good practice to examination the resulting variables, especially for missing values.
The type of missing value can make a deviation in recode results, and in most cases sytem missing values are more likely to yield the correct results.
Be enlightened how the utilize of and and or work with the if control, equally the expression is evaluated, and if a logical decision tin be made, it will be made. This happens fifty-fifty though one or more of the component variables may be missing.

9. For more than information

Come across Subsetting information in SPSS for information well-nigh subsetting data with variables that are missing.
For more information about missing values, see the SPSS Command Syntax Reference Guide .

rodriguezhantimpok.blogspot.com

Source: https://stats.oarc.ucla.edu/spss/modules/missing-data/

Read Multiple Variables Create New Variable if Not Blank Spss

ane. Introduction

2. System-missing values

three. User-divers missing values

4. How SPSS handles missing data in analysis commands

5. Summary of how missing values are handled in SPSS assay commands

half dozen. Missing values in assignment expressions

seven. Missing values in recoding commands

8. Issues to await out for

9. For more than information

0 Response to "Read Multiple Variables Create New Variable if Not Blank Spss"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel