Project Instructions

Background Information

In a Stroop task, participants are presented with a list of words, with each word displayed in a color of ink. The participant’s task is to say out loud the color of the ink in which the word is printed. The task has two conditions: a congruent words condition, and an incongruent words condition. In the congruent words condition, the words being displayed are color words whose names match the colors in which they are printed: for example RED , BLUE . In the incongruent words condition, the words displayed are color words whose names do not match the colors in which they are printed: for example PURPLE, ORANGE. In each case, we measure the time it takes to name the ink colors in equally-sized lists. Each participant will go through and record a time from each condition.

Questions For Investigation

As a general note, be sure to keep a record of any resources that you use or refer to in the creation of your project. You will need to report your sources as part of the project submission.

1. What is our independent variable? What is our dependent variable?

The independent variables are the conditions (congruent and incongruent), and the dependent variable is the time to name the ink color in equally-sized lits.

2. What is an appropriate set of hypotheses for this task? What kind of statistical test do you expect to perform? Justify your choices.

The null hypothesis will be that congruent’s and incongruent’s condition times will be the same, that people will identify equally fast the color in both conditions. The alternative hypothesis will be that the mean is changed.

I expect to perform a single-tailed test , since I think the time for the incongruent case will be higher than the one for the congruent case. Details of the means will be reviewed in the next section. One option can be to perform the test on differential measures to check the variations.

As the population in each sample is limited, it is better to use a t-test. This type of test is used in limited samples sizes, where the mean and variance are not known and it cannot be assumed a normal distirbution, using an estimation of it instead.

The test is performed chencking who different the means (denoted by \(\mu\)) are. In this specific case, \(\mu\) will stand for the time spent to name the correct ink colors.

Hence, the hypotheses will be:

  • \(H_0 : \mu_{c} = \mu_{i} \rightarrow \mu_D =0\)
  • \(H_a : \mu_{i} > \mu_{c} \rightarrow \mu_D = \mu_{i} - \mu_{c} > 0\)

where for c, i and D standing, respectively, for congruent, incongruent and difference, and \(\mu\) the mean time spent in each condition.

Now it’s your chance to try out the Stroop task for yourself. Go to this link, which has a Java-based applet for performing the Stroop task. Record the times that you received on the task (you do not need to submit your times to the site.) Now, download this dataset which contains results from a number of participants in the task. Each row of the dataset contains the performance for one participant, with the first number their results on the congruent task and the second number their performance on the incongruent task.

3. Report some descriptive statistics regarding this dataset. Include at least one measure of central tendency and at least one measure of variability.

library(reshape2)
df <- read.csv("~/Git/data-analyst-nanodegree/P1-test-perceptual-phenomenon/stroopdata.csv", 
    sep = ",", header = TRUE)

# Total population
mu_total <- mean(melt(df)$value)
s_total <- sd(melt(df)$value)
sprintf("Total mean is %.3f s, standard deviation is %.3f s", 
    mu_total, s_total)

# Differential case
df.diff <- df$Incongruent - df$Congruent

mu_diff <- mean(df.diff)
s_diff <- sqrt(sum((df.diff - mu_diff)^2)/(length(df.diff) - 
    1))
sprintf("Mean of differences is %.3f s, standard deviation is %.3f s", 
    mu_diff, s_diff)

# Congruent
mu_c <- mean(df$Congruent)
s_c <- sd(df$Congruent)

sprintf("Congruent mean is %.3f s, standard deviation is %.3f s", 
    mu_c, s_c)

# Inconruent
mu_i <- mean(df$Incongruent)
s_i <- sd(df$Incongruent)

sprintf("Incogruent mean is %.3f s, standard deviation is %.3f s", 
    mu_i, s_i)

The mean for the congruent condition is 14.051 s with a standard deviation of 3.559 s.

The mean for the incongruent condition is 22.016 s with a standard deviation of 4.797 s.

The mean of the difference is 7.965 s with a standard deviation of 4.865 s.

4. Provide one or two visualizations that show the distribution of the sample data. Write one or two sentences noting what you observe about the plot or plots.

    library(ggplot2)
    library(plyr)

    mu <- ddply(melt(df), "variable", summarise, grp.mean=mean(value)); # silence
    
    p <- ggplot(melt(df), aes(x=value, fill=variable))+ 
      geom_histogram(aes(y=..density..),  alpha=0.3) +
      geom_density(aes(x = value, color = variable), alpha=.6, linetype= "dashed")
    
    #Add mean lines
    p1<-p + geom_vline(data=mu, aes(xintercept=grp.mean, color=variable),
                 linetype="dashed", alpha=1)
    
    p2 <- ggplot(melt(df), aes(y=value,  x= variable, color = variable, 
                               fill= variable, shape= variable)) +  
      geom_boxplot(alpha=1/3, fill="white", outlier.alpha= 0) + 
      geom_dotplot(binaxis = 'y', binwidth = 1, method = "histodot")
    
    library(gridExtra)
    
    grid.arrange(p1, p2, ncol = 1)
Data distribution \label{fig1_normal}

Data distribution

Both the congruent and the incongruent group are normal distributions with different mean values (bigger mean time for the incongruent case) and similar standard deviations. In the case of the incongruent condition there is a significant presence of outlieres in the upper side of the distribution.

5. Now, perform the statistical test and report your results. What is your confidence level and your critical statistic value? Do you reject the null hypothesis or fail to reject it? Come to a conclusion in terms of the experiment task. Did the results match up with your expectations?

df_diff <- data.frame(df.diff)
n <- nrow(df_diff)
dfr <- n - 1
SE <- s_diff/sqrt(n)
t <- mu_diff/SE

Hypotheses

  • \(H_0 : \mu_{c} = \mu_{i} \rightarrow \mu_D =0\)
  • \(H_a : \mu_{c} \neq \mu_{i} \rightarrow \mu_D \neq 0\)

Difference test

The test used is the difference test, which is used to evaluate the performance on a group of subjects after changing the conditions, as it is done in the Stroop experiment. The steps are the following (Shier 2014, Cognitive Science (n.d.)):

  1. Calculate the difference \(d= t_i-t_c\)
  2. Calculate the mean of the difference \(\bar{d}=\) 7.965 s
  3. Calculate the standard error \(SE(\bar{d})= \frac{s_d}{\sqrt{n}}\), for \(s_d\) the standard deviation of the differences. \(SE(\bar{d})= \frac{s_d}{\sqrt{n}}=\) 0.993 s
  4. Calculate the t-statistic: \(t=\frac{\bar{d}}{SE(\bar{d})}=\) 8.021. Under the null distribution this statistic follows a t-distribution with n-1 (23) degrees of freedom.
  5. See where t falls in the \(t_{n-1}\) distribution Get the value fron the table: this will give the p-value for the paired t-test (Udacity, n.d.). \(t_{critical}=\) 1.714

\(t> t_{critical} \rightarrow\) t lies in the critical region

Results are statistically significant: NULL is REJECTED

From the data dsitribution and the mean values it could be already seen that the case where text name and ink color is incongruent produced people to take longer to identify the ink color. After performing the single tailed test, the t-statistic has lied in the critical region, way beyond the t-critical, what confirms that not only the null hypothesis is rejected, but the time taken to identify the proper ink color is signinficantly higher for the incongruent case.

Margin of error

\(\bar{d} \pm t^* \frac{s_d}{\sqrt{n}}=\) [5.911s, 10.019s] for \(t^*\) the value for 2.5% fot a t-distribution for n-1 degrees of freedom.

Cohen’s d

For the case of the paired test, Cohen’s d is defined as the difference of means between the post-test and pre-test treatments divided by the standard deviation of the pre-test condition.

\(Cohen's\; d= \frac{\mu_i-\mu_c}{SD_c}=\) 2.238

Coefficient of determination \(r^2\)

\(r^2= \frac{t^2}{t^2+df}=\) 0.737

6. Optional: What do you think is responsible for the effects observed? Can you think of an alternative or similar task that would result in a similar effect? Some research about the problem will be helpful for thinking about these two questions!

The changes are produced by applying a different treatment to the same users. Hence, their response will be altered, A similar effect could be observed in when retrieveing the performance of students before and after studying.

References

Cognitive Science, Department of. n.d. BME Faculty of Natural Sciences of Budapest. http://www.cogsci.bme.hu/~ktkuser/KURZUSOK/BMETE47MC38/2015_2016_1/7_The%.

Shier, Rosoe. 2014. “Paired T-Tests.” online. http://www.statstutor.ac.uk/resources/uploaded/paired-t-test.pdf.