TUTORIAL SPSS
langkah - langkah untuk menggunakan Program SPSS ini semoga bisa bermanfaat untuk pembaca.
Rabu, 19 November 2014
Selasa, 12 Agustus 2014
SPSS for Windows
A brief tutorial
This tutorial is a brief look at what SPSS for Windows is capable of doing. Examples will come from Statistical Methods for Psychology by David C. Howell. It is not our intention to teach you about statistics in this tutorial. For that you should rely on your classes in statistics and/or a good textbook. If you're a novice this tutorial should give you a feel for the programme and how to navigate through the many options. Beyond that, the SPSS Help Files should be used as a resource. Further, SPSS sells a number of very good manuals.
The Basics
SPSS for Windows has the same general look a feel of most other programmes for Windows. Virtually anything statistic that you wish to perform can be accomplished in combination with pointing and clicking on the menus and various interactive dialog boxes. You may have noted that the examples in the Howell textbook are performed/analyzed via code. That is, SPSS, like many other packages, can be accessed by programming short scripts, instead of pointing and clicking. We will not cover any programming in this tutorial.Presumeably, SPSS is already installed on your computer. If you don't have a shortcut on your desktop go to the [Start => Programs] menu and start the package by clicking on the SPSS icon.
Before proceeding I should say a few words about a very simple convention that will be used in this tutorial. In this point and click environment one often has to navigate through many layers of menu items before encountering the required option. In the above paragraph the prescribed task was to locate the SPSS icon in the [Start] menu structure. To get to that icon, one must first click on [Start] then move the pointer to the [Programs] options, before locating the SPSS icon. This sequence of events can be conveyed by typing [Start => Programs] . That is, one must move from the outer layer of the menu structure to some inner layer in sequence....
Now, back to the tutorial.
Once you've clicked on the SPSS icon a new window will appear on the screen. The appearance is that of a standard programme for windows with a spreadsheet-like interface.
As you can see, there are a number of menu options relating to statistics, on the menu bar. There are also shortcut icons on the toolbar. These serve as quick access to often used options. Holding your mouse over one of these icons for a second or two will result in a short function description for that icon. The current display is that of an empty data sheet. Clearly, data can either be entered manually, or it can be read from an existing data file.
Browsing the file menu, below, reveals nothing too surprising - many of the options are familiar. Although, the details are specific to SPSS. For example, the [New] option is used to specify the type of window to open. The various options, under the [New] heading are,
- [Data] Default window with a blank data sheet ready for analyses
- [Syntax] One can write scripts like those present in the Howell text, instead of using the menus. See the SPSS manuals for help on this topic.
- [Output] Whenever a procedure is run, the out is directed to a separate window. One can also have multiple [Output] windows open to organize the various analyses that might be conducted. Later, these results can be saved and/or printed.
- [Script] This window provides the opportunity to write fullblown programmes, in a BASIC-like language. These programmes have access to functions that make up SPSS. With such access it is possible to write user-defined procedures - those not part of SPSS - by taking advantage of the SPSS functions. Again, this is beyond the scope of this tutorial.
Also present in the [File] menu are two separate avenues for reading data from existing files. The first is the [Open] option. Like other application packages (e.g., WordPerfect, Excel, ....) SPSS also has it's own format for saving data. In this case, the accepted extension for any file saved using the proprietary format is "sav". So, one can have a datafile saved as "data1.sav". Anyways, this format is not readable with a text editor, it is a binary format. The benefits are that all formatting changes are maintained and the file can be read faster, hence the [Open] option. It is specifically meant for files saved in the SPSS format. The second option, [Read ASCII Data], as the name suggests is to read files that are saved in ASCII format. As can be seen, there are two choices - [Freefield] and [Fixed Columns]. Clicking on one of these options will produce a dialog box. One must specify a number of parameters before a file can be read successfully.
Reading ASCII files requires that the user know something about the format of the data file. Otherwise, one is likely get stuck in the process of reading, or the result may be a costly error. The more restrictive format is [Fixed Columns]. One must know how many variables there are, whether a variable is in numeric or string format, and the first and last column of each variable. For example, think of the following as an excerpt from an ASCII datafile.
male 37 102
male 22 115
male 27 99
.... .. ...
female 48 107
female 21 103
female 28 122
...... .. ...
An examination of the datafile provides several key pieces of information,
- There are 3 variables
- Variable 1 is a string , Variable 2 and 3 are numeric
- Variable 1: first column=1, last column=6
- Notice that none of the columns overlap. The longest case for column one is the name "female", that spans from the first column to the sixth - or, the letter e. As you can see, one has to manually locate the first and last column, of each variable.
- Variable 2: first column=9, last column=10
- Variable 3: first column=12, last column=14
male 37 102
male 22 115
male 27 99
.... .. ...
female 48 107
female 21 103
female 28 122
...... .. ...
Experiment with creating datafiles and reading them with this method. As for the SPSS format, there are a large number of sample datafiles included in your package. Just click on [Open] and find the SPSS home directory. Make sure the filetype in the dialog box associated with [Open] is set to "*.sav" - the default...
Before we move onto actual data, click on [Statistics] . The menu that appears reveals many classes of statistics available for use. Each class is further subdivided into other options, as denoted by the little arrow at the right size of the menu selector. Explore what is offered by moving your mouse over the various procedures listed.
These data are taken from table2.1 in Howell's text. The first column represents "Reaction Time in 100ths of a second" and the second column indicates "Frequency".
If you're entering data for the first time, like the above example, the variable names will be automatically generated (e.g., var00001, var00002,....). They are not very informative. To change these names, click on the variable name button. For example, double click on the "var00001" button. Once you have done that, a dialog box will appear. The simplest option is to change the name to something meaningful. For instance, replace "var00001" in the textbox with "RT" (see figure below).
In addition to changing the variable name one can make changes specific to [Type], [Labels], [Missing Values], and [Column Format].
- [Type] One can specify whether the data are in numeric
or string format, in addition to a few more formats. The default is
numeric format.
- [Labels] Using the labels option can enhance the
readability of the output. A variable name is limited
to a length of 8 characters, however, by using a variable
label the length can be as much as 256 characters. This
provides the ability to have very descriptive labels that will
appear at the output.
Often, there is a need to code categorical variables in numeric
format. For example, male and female can be coded
as 1 and 2, respectively. To reduce confusion,
it is recommended that one uses value labels . For the
example of gender coding, Value:1 would have a correspoding
Value label: male. Similarly, Value:2 would be coded with
Value Label: female. (click on the [Labels] button
to verify the above)
- [Missing Values] See the accompanying help. This option provides a means to code for various types of missing values.
- [Column Format] The column format dialog provides control over several features of each column (e.g., width of column).
Once data has been entered or modified, it is adviseable to save. In fact, save as often as possible [File => SaveAs].
SPSS offers a large number of possible formats, including their own. A list of the available formats can be viewed and selected by clicking on the Save as type: , on the SaveAs dialog box. If your intention is to only work in SPSS, then there may be some benefit to saving in the SPSS(*.sav) format. I assume that this format allows for faster reading and writing of the data file. However, if your data will be analyzed and looked by other packages (e.g., a spreadsheet), it would be adviseable to save in a more universal format (e.g., Excel(*.xls), 1-2-3 Rel 3.0 (*.wk3).
Once the type of file has been selected, enter a filename, minus the extension (e.g., sav, xls). You should also save the file in a meaningful directory, on your harddrive or floppy. That is, for any given project a separate directory should be created. You don't want your data to get mixed-up.
The process of reading already saved data can be painless if the saved format is in the SPSS or a spreadsheet format. All one has to do is,
- click on [File => New => Data]
- click on [File => Open] : a dialog box will appear
- navigate to desired directory using the Look in: menu at the top of the dialog box
- select file type in the Files of type menu
- click on the filename that is needed.
To make life easier the [File => Read ASCII Data => Freefield] will be used.
The resulting dialog box requires that a File , a Name and a Data Type be specified for each variable, or column of data. The desired file is accessed by clicking on the [Browse] button, and then navigating to the desired location. Since the extension for the sought after file is dat there is no need to change the Files of type: selection. However, if the extension is something else (e.g., *.txt) then it would be necessary to select All files(*.*) from the Files of type: menu. Since there are 4 variables in this data set, 4 names with the corresponding type information must be specified. To Add the first variable, observations, to the list,
- type "obs" in the Name box
- the Data Type is set to Numeric by default. If "obs" was a string variable, then one would have to click on String
- click on the Add button to include this variable to the list.
- repeat the above procedure with new names and data types for each of the remaining variables. It is important that all variables be added to the list. Otherwise, the data will be scrambled.
The resulting data files appears in the data editor like the following.
The next section will cover some descriptive statistics.
Descriptive Statistics
We can replicate the frequency analyses that are described in chapter 2 of the text, by using the file that was just read into the data editor - tab2-1.dat. These analyses were conducted on the reaction time data. Recall, that we have labelled this data as RT. To begin, click on [Statistics=>Summarize=>Frequencies]....
Each time as statistical procedure is run, like [Frequencies...] and [Descriptives...] the results are posted to an Output Window. If several procedures are run during one session the results will be appended to the same window. However, greater organization can be reached by opening new Output windows before running each procedure - [File=>New=>Output]. Further, the contents of each of these windows can be saved for later review, or in the case of charts saved to be later included in formattted documents. [Explore by left mouse clicking on any of the output objects (e.g., a frequency table, a chart, ...) followed by a right button click. The left left button click will highlight the desired object, while the right button click will popup a new menu. The next step is to click on the copy option. This action will store the object on the clipboard so that it can be pasted to Word for Windows document, for example.....]
Chi-Square & T-Test
The computation of the Chi-Square statistic can be accomplished by clicking on [Statistics => Summarize => Crosstabs...]. This particular procedure will be your first introduction to coding of data, in the data editor. To this point data have been entered in a column format. That is, one variable per column. However, that method is not sufficient in a number of situations, including the calculation of Chi-Square, Independent T-tests, and any Factorial ANOVA design with between subjects factors. I'm sure there are many other cases, but they will not be covered in this tutorial. Essentially, the data have to be entered in a specific format that makes the analysis possible. The format typcially reflects the design of the study, as will be demonstrated in the examples.
In your text, the following data appear in section 6.????. Please read the text for a description of the study. Essentially, the table - below - includes the observed data and the expected data in parentheses.
Fault | Guilty | Not Guilty | Total |
Low | 153(127.559) | 24(49.441) | 177 |
High | 105(130.441) | 76(50.559) | 181 |
Total | 258 | 100 | 358 |
For the Chi-Square statistic, the table of data can be coded by indexing the column and row of the observations. For example, the count for being guilty with Low fault is 153. This specific cell can be indexed as coming from row=1 and column=1. Similarly, Not Guilty with High fault is coded as row=2 and column=2. For each observation, four in this instance, there is unique code for location on the table. These can be entered as follows,
Row | Column | Count |
1 | 1 | 153 |
1 | 2 | 24 |
2 | 1 | 105 |
2 | 2 | 76 |
- So, 2 rows * 2 columns equals 4 observations. That should be clear.
- For each of the rows, there are 2 corresponding columns, that is reflected in the Count column. The Count column represents the number of time each unique combination Row and Column occurs.
- The first step is to inform SPSS that the COUNT variable
represents the frequency for each unique coding of ROW and COLUMN,
by invoking the WEIGHT command. To do this, click on
[Data => Weight Cases]. In the resultant dialog box,
enable the Weight cases by option, then move the
COUNT variable into the Frequency Variable box. If this
step is forgotten, the count for each cell will be 1 for the table.
- Now that the COUNT variable has been processed as a weighted variable, select [Statistics => Summarize => Crosstabs...] to launch the controlling dialog box.
- At the bottom of the dialog box are three buttons, with the most important being the [Statistics...] button. You must click on the [Statistics...] button and then select the Chi-square option, otherwise the statistic will not be calculated. Exploring this dialog box makes it clear that SPSS can be forced to calcuate a number of other statistics in conjuction with Chi-square. For example, one can select the various measures of association (e.g., contingency coefficient, phi and cramer's v,...), among others.
- Move the ROW variable into the Row(s): box, and
the COLUMN variable into the Column(s):, then click
[OK] to perform the analysis. A subset of the output
looks like the following,
T-tests
By now, you should know that there are two forms of the t-test, one for dependent variables and one for independent variables, or observations. To inform SPSS, or any stats package for that matter, of the type of design it is necessary to have to different ways of laying out the data. For the dependent design, the two variables in question must be entered in two columns. For independent t-tests, the observations for the two groups must be uniquely coded with a Gruop variable. Like the calculation of the Chi-square statistic, these calculations will reinforce the practice of thinking about, and laying out the data in the correct format.Dependent T-Test
To calculate this statistic, one must select [Statistics => Compare Means => Paired-Samples T Test...] after enterin the data. For this analysis, we'll use the data from Table 7.3, in Howell.- Enter the data into a new datafile. Your data should look
a bit like the following. That is, the two variables should occupy
separate columns...
Mnths_6 Mnths_24 124 114 94 88 115 102 110 2 116 2 139 2 116 2 110 2 129 2 120 2 105 2 88 2 120 2 120 2 116 2 105 2 ... ... ... ... 123 132 - To calculate the t statistic click on
[Statistics => Compare Means => Paired-Samples T Test...],
then select the two variables of interest. To select the
two variables, hold the [Shift] key down while
using the mouse for selection. You will note that the
selection box requires that variables be selected two
at a time. Once the two variables have been selected, move
them to the Paired Variables: list. This procedure
can be repeated for each pair of variables to be analyzed.
In this case, select MNTHS_6 and MNTHS_24 together, then
move them to the Paired Variables list. Finally,
click the [OK] button.
The critical result for the current analysis will appear in
the output window as follows,
As you can see an exact t-value is provided along with an exact p-value, and this p-value is greater that the expected value of 0.025, for a two-tailed assessment. Closer examination indicates several other statistics are presented in output window.
Independent T-tests
When calculating an independent t-test, the only difference involves the way the data are formatted in the datasheet. The datasheet must include both the raw data and group coding, for each variable. For this example, the data from table 7.5 will be used. As an added bonus, the number of observations are unequal for this example.Take a look at the following table to get a feel for how to code the data.
Group | Exp_Con |
1 | 96 |
1 | 127 |
1 | 127 |
1 | 119 |
1 | 109 |
1 | 143 |
1 | ... |
1 | ... |
1 | 106 |
1 | 109 |
2 | 114 |
2 | 88 |
2 | 104 |
2 | 104 |
2 | 91 |
2 | 96 |
2 | ... |
2 | ... |
2 | 114 |
2 | 132 |
From the above you can see that we used the "Group" variable to code for the two variables. The value of 1 was used to code for "LBW-Experimental", while a value of 2 was used to code for "LBW-Control". If you're confused please study the table, above.
To generate the t-statistic,
- Clik on [Statistics => Compare Means => Independent-Samples T Test] to launch the appropriate dialog box.
- Select "exp_con" - the dependent variable list - and move it to the Test Variable(s): box.
- Select "group" - the grouping variable list - and move it to the Grouping Variable: box.
- The final step requires that the groups be defined. That is, one must specify that Group1 - the experimental group in this case - is coded as 1, and Group2 - the control group in this case - is coded as 2. To do this, click on the [Define Groups...] button. Click on the [Continue] button to return to the controlling dialog box.
- Run the analysis by clicking on the [OK] button.
The output for the current analysis extracted from the
output window looks like the following.
In the next section we will briefly demonstrate the calculation of correlations and regression, as discussed in Chapter 9 of Howell. In truth, you should be able to work through many statistics with your current knowledge base and the help files, including correlations and regressions. Most statistics can be calculated with a few clicks of the mouse.
Correlations and Regression
This will be a brief tutorial, since there is very little that is required to calculate correlations and linear regressions. To calculate a simple correlation matrix, one must use [Statistics => Correlate => Bivariate...], and [Statistics => Regression => Linear] for the calculation of a linear regression.
For this section, the analyses presented in the computer section of the Correlation and Regression chapter will be replicated. To begin, enter the data as follows,
IQ | GPA |
102 | 2.75 |
108 | 4.00 |
109 | 2.25 |
118 | 3.00 |
79 | 1.67 |
88 | 2.25 |
... | ... |
... | ... |
85 | 2.50 |
Simple Correlation
- Click on [Statistics => Correlate => Bivariate...], then select and move "IQ" and "GPA" to the Variables: list. [Explore the options presented on this controlling dialog box.]
- Click on [OK] to generate the requested statistics.
As you can see, r=0.702, and p=.000. The results suggest that the correlation is significant.
Note: In the above example we only created a correlation matrix based on two variables. The process of generating a matrix based on more than two variables is not different. That is, if the dataset consisted of 10 variables, they could have all been placed in the Variables: list. The resulting matrix would include all the possible pairwise correlations.
Correlation and Regression
Linear regression....it is possible to output the regression coefficients necessary to predict one variable from the other - that minimize error. To do so, one must select the [Statistics => Regression => Linear...] option. Further, there is a need to know which variable will be used as the dependent variable and which will be used as the independent variable(s). In our current example, GPA will be the dependent variable, and IQ will act as the independent variable. Specifically,- Initiate the procedure by clicking on [Statistics => Regression => Linear...]
- Select and move GPA into the Dependent: variable box
- Select andmove IQ into the Independent(s): variable box
- Click on the [OK] to generate the statistics. Note: A variety of options can be accessed via the buttons on the bottom half of this controlling dialog box (e.g., Statistics, Plots,...). Many more statistics can be generated by explore the additional options via the Statistics button.
The correlation is still 0.702, and the p value is still 0.000. The additional statistics are "Constant", or a from the text, and "Slope", or B from the text. If you recall, the dependent variable is GPA, in this case. As such, one can predict GPA with the following,
The next section will discuss the calculation of the ANOVA.
One-Way ANOVA
As in the independent t-test datasheet, the data must be coded with a group variable. The data that will be used for the first part of this section is from Table 11.2, of Howell. There are 5 groups of 10 observations each - resulting in a total of 50 observations. The group variable will be coded from 1 to 5, for each group. Take a look at the following to get an idea of the coding.
Groups | Scores |
1 | 9 |
1 | 8 |
1 | 6 |
... | ... |
1 | 7 |
2 | 7 |
2 | 9 |
2 | 6 |
... | ... |
... | ... |
... | ... |
5 | 10 |
5 | 19 |
... | ... |
5 | 11 |
To complete the analysis,
- Select [Statistics => Compare Means => One-Way ANOVA...] to launch the controlling dialog box.
- Select and move "Scores" into the Dependent list:
- Select and move "Groups" into the Factor: list
- Click on [OK]
The preceeding is a complete spefication of the design for this
oneway anova. The simple presentation of the results, as taken
from the output window, will look like the following,
Selecting [Options...] you will find,
If Descriptive is enabled, then the descriptive statistics for each condition will be generated. Making Homogeneity-of-variance active forces a Levene's test on the data. The statistics from both of these analyses will be reproduced in the output window.
Selecting [Post Hoc] will launch the following dialog box,
One can active one or multiple post hoc tests to be performed. The results will then be placed in the output window. For example, performing a R-E-G-W F statistic on the current data would produce the following,
Finally, one can use the [Contrasts...] option to specify linear and/or orthogonal sets of contrasts. One can also perform trend analysis via this option. For example, we may wish to contrast the third condition with the fifth,
For each contrast, the coefficients must be entered individually, and in order. Once can also enter multiple contrasts, by using the [Next] present in the dialog box. The result for the example contrast would look like the following,
Further, one can use the Polynomial option to test whether a specific trend in the data exists.
Factorial designs will be covered in the next section.
Factorial ANOVA
To conduct a Factorial ANOVA one only need extend the logic of the oneway design. Table 13.2 presents the data for a 2 by 5 factorial ANOVA. The first factor, AGE, has two levels, and the second factor, CONDITION, has five levels. So, once again each observation can be uniquely coded.
AGE | CONDITION |
Old = 1 | Counting = 1 |
Young = 2 | Rhyming = 2 |
Adjective = 3 | |
Imagery = 4 | |
Intentional = 5 |
AGE | CONDITIO | Scores |
1 | 1 | 9 |
1 | 1 | 8 |
1 | 1 | 6 |
1 | ... | ... |
1 | 1 | 7 |
1 | 2 | 7 |
1 | 2 | 9 |
1 | 2 | 6 |
1 | ... | ... |
1 | ... | ... |
1 | ... | ... |
1 | 5 | 10 |
1 | 5 | 19 |
1 | ... | ... |
1 | 5 | 11 |
2 | 1 | 8 |
2 | 1 | 6 |
2 | 1 | 4 |
2 | ... | ... |
2 | 1 | 7 |
2 | 2 | 10 |
2 | 2 | 7 |
2 | 2 | 8 |
2 | ... | ... |
2 | ... | ... |
2 | ... | ... |
2 | 5 | 21 |
2 | 5 | 19 |
2 | ... | ... |
2 | 5 | 21 |
To compute the relevant statistics - a simple approach,
- Select [Statistics => General Linear Model => Simple Factorial...]
- Select and move "Scores" into the Dependent: box
- Select and move "Age" into the Factor(s): box.
- Click on [Define Range...] to specify the range of coding for the Age factor. Recall that 1 is used for Old and 2 is used for Young. So, the Minimum: value is <1>, and the Maximum: value is 2. Click on [Continue].
- Select and move "Conditio" into the Dependent: box
- Click on [Define Range...] to specify the range
of the Condition factor. In this case the Minimum: value is
1 and the Maximum: value is 5.
By clicking on the [Options...] button
one has the opportunity to select the Method used. According
to the online help,
-
"Method: Allows you to choose an alternate method for
decomposing sums of squares. Method selection controls how the effects
are assessed."
- Under [Options...] activate Hierarchical, or Experimental, then activate Means and counts - Click [Continue]
- Click on [OK] to generate the output.
As you can see the use of the Means and count option produces a nice summary table, with all the Variable Labels and Value Labels that were incorporated into the datasheet. Again, the use of those options makes the output a great deal more readable.
The output is a complete source table with the factors identified with Variable Labels
- By selecting [Model...], one can specify a Custom model. The default is for a Fully Factorial model, however, with the Custom option one can explicitly determine the effects to look at.
- The Contasts option allows one "test the differences among the levels of a factor" (see the manual for greater detail).
- Various graphs can be specified with the [Plots...] option.
For example, one can plot "Conditio" on the Horizontal Axis:,
and "Age" on Separate Lines:, to generate a simple
"conditio*age" plot (see the dialog box for [Plots...],
- The standard post-hoc tests for each factor can be calculated by selecting the desired options under [Post Hoc...]. All one has to do is select the factors to analyze and the appropriate post-hoc(s).
- The [Options...] dialog box provides a number of
diagnostic and descriptive features. One can generate descriptive
statistics, estimates of effect size, and tests for homogeneity
of variance - among others. An example source table using some of
these options would look like the following,
Repeated measures designs will be discussed in the next section.
Langganan:
Postingan (Atom)