Importing data from SPSS files

Importing from a SPSS "system" file

In order to import data from an SPSS "system" file, the usual binary format in which SPSS data now is usually saved and often distributed, one needs to first make the file that contains the data known to R, as in the following example:

library(memisc)
ZA5702 <- spss.system.file("Data/ZA5702_v2-0-0.sav")
ZA5702

SPSS system file 'Data/ZA5702_v2-0-0.sav' 
    with 979 variables and 3911 observations

Once the "system file" is declared using the function spss.system.file(), metadata becomes available, such as the number of cases and variables (as just seen), the names and labels of the variables (as seen below):

description(ZA5702)

study    'Studiennummer'                       
version  'GESIS Archiv Version'                
year     'Erhebungsjahr'                       
field    'Erhebungszeitraum'                   
glescomp 'GLES-Komponente'                     
survey   'Erhebung/Welle'                      
lfdn     'Laufende Nummer (Kumulation)'        
vlfdn    'Laufende Nummer (Vorwahl)'           
nlfdn    'Laufende Nummer (Nachwahl)'          
datum    'Datum der Befragung (Monat/Tag/Jahr)'

(Here only an extract of the full output was shown, since the data set contains as many as 979 variables.)

An "importer" object, such as ZA5702 in this example, would also allow to obtain a full codebook with

codebook(ZA5702)

but we refrain from showing such a codebook for the obvious reason of not creating too much output. As the inspection of the data in the file shows, most variable names have a standardised, yet non-mnemonic structure. Variables referring to questions asked in the pre-election wave of the GLES 2013 study have names starting with "v", those referring to questions asked in the post-election wave have names starting with "v", while those referring to question asked in both waves have names starting "nv". For a specific analysis, such variable names are not very useful. For this reason we want to rename them. We could do this after loading the data, but it is more convenient to do the data import and the renaming in one step as in the example below:

gles2013work <- subset(ZA5702,
                       select=c(
                         wave                  = survey,
                         intent.turnout        = v10,
                         turnout               = n10,
                         voteint.candidate     = v11aa,
                         voteint.list          = v11ba,
                         postal.vote.candidate = v12aa,
                         postal.vote.list      = v12ba,
                         vote.candidate        = n11aa,
                         vote.list             = n11ba,                 
                         bula                  = bl
                       ))

The variable names to the left of the equality sign are the variable names as they will appear in the data set after import, while the variable names to the right of the equality aign are the variable names as they exist in the data file.

As a demonstration of what information can be extracted from the data file, we create a codebook for one of the items in the data set:

codebook(gles2013work$turnout)

================================================================================

   gles2013work$turnout 'Wahlbeteiligung'

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: interval
   Missing values: -Inf - -1

   Values and labels                      N Valid Total
                                                       
   -99 M 'keine Angabe'                   3         0.1
   -97 M 'trifft nicht zu'               20         0.5
   -94 M 'nicht in Auswahlgesamtheit'  2003        51.2
     1   'ja, habe gewaehlt'           1596  84.7  40.8
     2   'nein, habe nicht gewaehlt'    289  15.3   7.4
                                                       
        Min: 1.000                                     
        Max: 2.000                                     
       Mean: 1.153                                     
   Std.Dev.: 0.360

Import from a SPSS "portable" file

Data from SPSS "portable" files are imported in essentially the same way as data from SPSS "system" files: The first step again is to make the data set known to R:

ZA3861 <- spss.portable.file("Data/ZA3861.por",iconv=FALSE)
ZA3861

SPSS portable file 'Data/ZA3861.por' 
    with 331 variables and 3263 observations

Since this file contains German umlauts (in contrast to the previous example), we need to convert the character coding of the value labels etc. from "Latin-1" (the original coding of the data) into the native encoding of the system (unless the computer is using natively "Latin-1" encoding and not - as must Mac and most Linux System - a variant of UTF8).

ZA3861 <- Iconv(ZA3861,from="latin1")

Importer objects created from "portable" files can be examined in the same way as importer objects created from "system" files. For example, we get a description of the variables in the data set (the variable labels) and a codebook.

description(ZA3861)

vvpnid   'Fallnummer'                                 
vsplitwo 'West-Ost-Kennung'                           
vvornach 'Vor-/Nachwahl'                              
vland    'Bundesland'                                 
v10      'Wirtschaftl. Lage allgemein'                
v20      'Wirtschaftl. Lage retrospektiv'             
v30      'Wirtschaftl. Lage prospektiv'               
v31      'Wichtigkeit Erst/Zweitstimme BTW (nicht 94)'
v40      'Demokratiezufriedenheit'                    
v50      'Staerke Politikinteresse'

To actually import the data and make them accessible for analysis we can (as above), use as.data.set(), or subset() as in this example:

work2002 <- subset(ZA3861,
    select=c(
          respid            = VVPNID,
          split.wo          = VSPLITWO,
          split.vor.nach    = VVORNACH,
          Bundesland        = VLAND,
          Erststimme        = V69,
          Zweitstimme       = V70,
          Geschlecht        = VSEX,
          GebMonat          = VMONAT,
          GebJahr           = VJAHR,
          Konfession        = VRELIG,
          Kirchgang         = VKIRCHG,
          Erwerbst          = VBERUFTG,
          FrErwerbst        = VFRBERTG,
          Beruf             = VBERUF,
          Famstand          = VFAMSTDN,
          Partner           = VPARTNER,
          BildungP          = VPBILDGA,
          BerufstP          = VPBERUFT,
          FrBerufstP        = VPFBERTG,
          BerufP            = VPBERUF,
          ReprGewicht       = VGVWNW
        )
    )

Import from a fixed-width file accompanied by SPSS syntax

Data from more recent study components of the American National Elecion Study comes in fixed-width format, with some additional SPSS syntax files that define columns, variable labels, value labels, and missing values. memisc also provides an importer function such data. Naturally this requires a little bit more information. In addition to the actual data file, we also need a file with SPSS syntax specifying the data columns. Optionally, Syntax files that define variable labels, value lables, and missing values can also be specified.

anes2008TS <- spss.fixed.file("Data/anes2008/anes2008TS_dat.txt",
                              columns.file="Data/anes2008/anes2008TS_col.sps",
                              varlab.file="Data/anes2008/anes2008TS_lab.sps",
                              codes.file="Data/anes2008/anes2008TS_cod.sps",
                              missval.file="Data/anes2008/anes2008TS_md.sps")
anes2008TS

SPSS fixed column file 'issues/anes2008/anes2008TS_dat.txt' 
    with 1954 variables and 2322 observations
    with variable labels from file 'issues/anes2008/anes2008TS_lab.sps' 
    with value labels from file 'issues/anes2008/anes2008TS_cod.sps' 
    with missing value definitions from file 'issues/anes2008/anes2008TS_md.sps'

Further information about the data can now be obtained from the returned importer object in the same way as from importer objects that describe SPSS "system" or SPSS "portable" files. That is, we can use names(), description(), and codebook(). To get the data in to the memory of R we can use (as above) the functions as.data.set() and subset().

Importing data from a Stata file

Data from Stata files (up to Stata Version 12) can be imported in the same way as data from SPSS files. The main difference is the function used for it, and the fact that user-defined missing values do not exists in Stata. For this, see the following example:

library(memisc)
ZA5702.dta <- Stata.file("Data/ZA5702_v2-0-0.dta")
ZA5702.dta

Stata file 'Data/ZA5702_v2-0-0.dta' 
    with 874 variables and 3911 observations

gles2013work.dta <- subset(ZA5702.dta,
                       select=c(
                         wave                  = survey,
                         intent.turnout        = v10,
                         turnout               = n10,
                         voteint.candidate     = v11aa,
                         voteint.list          = v11ba,
                         postal.vote.candidate = v12aa,
                         postal.vote.list      = v12ba,
                         vote.candidate        = n11aa,
                         vote.list             = n11ba,                 
                         bula                  = bl
                       ))
codebook(gles2013work.dta$turnout)

================================================================================

   gles2013work.dta$turnout 'Wahlbeteiligung'

--------------------------------------------------------------------------------

   Storage mode: integer
   Measurement: nominal
   Missing values: 100 - 127

   Values and labels                               N Percent
                                                            
   -99   'keine Angabe'                            3     0.1
   -98   'weiss nicht'                             0     0.0
   -97   'trifft nicht zu'                        20     0.5
   -96   'Split'                                   0     0.0
   -95   'nicht teilgenommen'                      0     0.0
   -94   'nicht in Auswahlgesamtheit'           2003    51.2
   -93   'Interview abgebrochen'                   0     0.0
   -92   'Fehler in Daten'                         0     0.0
   -86   'nicht wahlberechtigt'                    0     0.0
   -85   'nicht waehlen'                           0     0.0
   -84   'keine Erst-/Zweitstimme abgegeben'       0     0.0
   -83   'ungueltig waehlen'                       0     0.0
   -82   'keine andere Partei waehlen'             0     0.0
   -81   'noch nicht entschieden'                  0     0.0
   -72   'nicht einzuschaetzen'                    0     0.0
   -71   'nicht bekannt'                           0     0.0
     1   'ja, habe gewaehlt'                    1596    40.8
     2   'nein, habe nicht gewaehlt'             289     7.4

Importing data from SPSS and Stata

Motivation

The role of "importer" objects

Importing data from SPSS files

Importing from a SPSS "system" file

Import from a SPSS "portable" file

Import from a fixed-width file accompanied by SPSS syntax

Importing data from a Stata file