Introduction

To draw inferences from null hypothesis testing, a range of assumptions must be made. Here we look at our database to examine the extent to which these assumptions are valid within the language acquisition literature. We find broadly that these assumptions are merited.

Data availability/reporting standards

An important component of reproducbility is complete description of data in published report. This is critical both for evaluating an individual study, but also for the purposes of a cumulative science (e.g. meta-analysis). Here we explore the extent to which papers report desired statistics such as test-statistics mean and standard deviation, effect sizes and test statistics.

counts = all_data %>%
            summarise(test_statistic = sum(!is.na(t) | !is.na(F) | !is.na(r)),
                      means = sum(!is.na(x_1)),
                      SD = sum(!is.na(SD_1)),
                      d = sum(!is.na(d_calc)),
                      g = sum(!is.na(g_calc)),
                      r = sum(!is.na(r_calc)),
                      age_range = sum(!is.na(age_range_1)),
                      gender = sum(!is.na(gender_1))) %>%
            gather("coded_variable", "n") %>%
            mutate(coded = "coded") %>%
            mutate(total = nrow(all_data))  %>%
            mutate(coded_variable = factor(coded_variable, levels = c("d", "g", "r", "means",
                                                                        "SD", "test_statistic",
                                                                        "age_range", "gender")))
counts = counts %>%
             mutate(n = total - n, 
                    coded = "uncoded")  %>%
             bind_rows(counts) %>%
             mutate(n_lab = ifelse(coded == "coded", n, "")) %>%
             arrange(coded)
ggplot(counts) + 
  geom_bar(aes(x = coded_variable, 
               y = n/total,
               fill = coded,
              order = coded), 
           stat = "identity") + 
  ylim(0,1) + 
  ylab("Proportion coded") + 
  xlab("Coded variable") + 
  ggtitle("All data") + 
   #annotate("text", x = 1, y = .9, 
    #        label = paste("N =", counts$total[1]), size = 6) + 
  scale_fill_manual(values=c( "lightgreen", "grey")) + 
  geom_text(aes(label = n_lab, x = coded_variable, y = n/total -.06) )+
  theme_bw() + 
  theme(panel.border = element_blank(), 
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
       axis.line = element_line(colour = "black"),
       text = element_text(size=20),
       axis.text.x = element_text(angle = 30, hjust = 1))

This analysis is in practice difficult because our many of our source MA’s include effect sizes that were included by the coders. Similarly, the proportion of coded gender in many cases reflects only that the coder chose not to code that, not that this information was not present. Nevertheless, this analysis gives us a good sense of the the proportion of papers that report means and standard deviations (about two-thirds).

By MA

counts = all_data %>%
            group_by(dataset) %>%
            summarise(test_statistic = sum(!is.na(t) | !is.na(F)),
                      means = sum(!is.na(x_1)),
                      SD = sum(!is.na(SD_1)),
                      d = sum(!is.na(d_calc)),
                      g = sum(!is.na(g_calc)),
                      r = sum(!is.na(r_calc)),
                      age_range = sum(!is.na(age_range_1)),
                      gender = sum(!is.na(gender_1)),
                      total = n()) %>%
            gather(coded_variable, n, -dataset, -total) %>%
            mutate(coded = "coded") %>%
            mutate(coded_variable = factor(coded_variable, levels = c("d", "g", "r", "means",
                                                                        "SD", "test_statistic",
                                                                        "age_range", "gender")))

counts = counts %>%
             mutate(n = total - n, 
                    coded = "uncoded")  %>%
             bind_rows(counts) %>%
             mutate(n_lab = ifelse(coded == "coded", n, "")) %>%
  arrange(coded)

ggplot(counts, aes(fill = coded)) + 
  geom_bar(aes(x = factor(coded_variable), 
               y = n/total),  ## FIX THIS
           stat = "identity",
           position = "fill") + 
  facet_wrap(~dataset, ncol=2) +
  ylim(0,1) + 
  ylab("Proportion coded") + 
  xlab("Coded variable") + 
  scale_fill_manual(values = c("lightgreen", "grey")) + 
  geom_text(aes(label = n_lab,
              x = coded_variable, 
              y = n/total -.06)) +
  theme_bw() + 
  theme(panel.border = element_blank(), 
        panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        axis.line = element_line(colour = "black"),
        text = element_text(size=20),
        strip.background  = element_blank(), 
        axis.text.x = element_text(angle = 30, hjust = 1))

Sample sizes and exclusions

Here we examine how sample sizes and exclusion rates are related to other variables of inteerest.

How do sample sizes vary as a function of age?

Given that effect size broadly increases with age, we would expect sample sizes to decrease. Broadly, the data are consistent with this. Phonemic discriminatiion (native) is an exception, but this may be due to differences in method (?).

all = all_data %>%
  mutate(n_total = ifelse(!is.na(n_2), n_1 + n_2, n_1)) %>%
  select(dataset, d_calc, 
         mean_age_1, n_total,
         n_excluded_1, num_trials, method)
 
ggplot(all, aes(x = mean_age_1, y = n_total, color = dataset)) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_wrap(~dataset, scales = "free") +
  theme_bw() +
  theme(legend.position="none")

Do sample sizes vary with effect sizes?

Here we explore a more direct test that sample sizes are coupled with effect sizes. We test this by looking at the relationship between sample sizes and effect size, after residualizing out the effect of (1) phenomenon, (2) age, and (3) method.

Compute residuals.

full.model = rma(d_calc ~ method + dataset + mean_age_1, 
        vi = d_var_calc, data = all_data, method = "REML")

residuals = rstandard(full.model)

all_data = all_data %>%
            bind_cols(as.data.frame(residuals$resid),
                      as.data.frame(residuals$z)) %>%
            rename(residual.d = `residuals$resid`, 
                   residual.d.s = `residuals$z`) # standardized
ns_ds = all_data %>%
  mutate(n_total = ifelse(!is.na(n_2), n_1 + n_2, n_1)) %>%
  select(dataset, d_calc, residual.d.s,
         mean_age_1, n_total) 

ggplot(ns_ds , aes(y = n_total, x = residual.d.s, color = dataset)) +
  geom_point() +
  geom_smooth(method = "lm") +
  theme_bw() +
  xlab("Residual effect size") + 
  facet_wrap(~dataset, scales = "free") +
  theme(legend.position="none")

If sample size and effect sizes were appropriately coupled, we should expect sample to decrease as effect size increases. Across 3 meta-analyses, we find the opposite pattern: sample size increases with effect size. This suggest that researchers either are not prospectively planning their sample sizes based on effect size estimates.

Does the number of excluded infants affect effect sizes?

Here we look at whether the number of excluded infants varies as a function of effect size. If infants are excluded based on an indepenent criteria, and before looking at the data, there should be no relationship. Again, we use residualized effect sizes.

However, there are some techincal issues we need to address before properly doing this analysis:

  • not here that different data sets use differently names columsn to indicate exclusions (“n_excluded”, “n_excluded_1”, “n_excluded_fussers”)
  • an additional complicaiton: are the exlcusions part of the n?
  • we don’t have num_trials for any, but this would be great to look at
  • what’s up with the n_excluded > n?

Compute residuals.

age.method = rma(mean_age_1 ~ method + dataset, 
        vi = d_var_calc, data = all_data, method = "REML")

residuals.age.method = rstandard(age.method)

all_data = all_data %>%
            bind_cols(as.data.frame(residuals.age.method$resid),
                      as.data.frame(residuals.age.method$z)) %>%
            rename(residual.d.age = `residuals.age.method$resid`, 
                   residual.d.s.age = `residuals.age.method$z`) # standardized
exclusions = all_data %>%
  filter(!is.na(n_excluded_1) || !is.na(n_excluded_2)) %>%
  mutate(n_excluded =  ifelse(!is.na(n_excluded_2), n_excluded_1 + n_excluded_2, n_excluded_1),
         n_total = ifelse(!is.na(n_2), n_1 + n_2, n_1)) %>%
  select(dataset, d_calc, 
         mean_age_1, n_total,
         n_excluded, num_trials, method, residual.d.s, residual.d.s.age) %>%
  mutate(prop_excluded = n_excluded/n_total) %>%
  filter(prop_excluded < 1)
 
  ggplot(exclusions, aes(x = prop_excluded, y =residual.d.s ,
                       color = dataset)) +
  geom_point() +
  geom_smooth(method = "lm") +
  ggtitle("prop. excluded as a function of d") +
  xlab("proportion excluded") +
  ylab("d") +
  theme_bw() + 
  facet_wrap(~dataset, scales = "free") +
  theme(legend.position="none")

Does proportion excluded vary as a function of age?

Again, if there is no bias we should see no relationship. In contrast, we see some evidence that proportion excluded increases with age. This is difficult to interpret though, for the reasons described above.

ggplot(exclusions, aes(x = residual.d.s.age, y = prop_excluded,
                       color = dataset)) +
  geom_point() +
  geom_smooth(method = "lm") +
  ggtitle("prop. excluded as a function of age") +
  xlab("mean age (months)") +
  ylab("proportion excluded") +
  theme_bw()  + 
  facet_wrap(~dataset, scales = "free") +
  theme(legend.position="none")

Publication Bias

If studies are randomly sampled from the population of possible studies, effect sizes should not be bias. Here we examine this assumption, using residualized effect sizes.

Do effect sizes vary as a function of year?

years.df = all_data %>%
  mutate(year = as.numeric(unlist(lapply(strsplit(unlist(all_data$study_ID),
                                                  "[^0-9]+"),  function(x) unlist(x)[2])))) 

ggplot(years.df , aes(x = year, y = residual.d.s, color = dataset)) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_wrap(~ dataset, scales = "free_y") +
  theme_bw() +
  xlab("published year") +
  theme(legend.position="none")

We see a bias of year here: Studies conducted early on in a literature tend to have larger effect sizes than later studies.

Does age vary as a function of year?

ggplot(years.df , aes(x = year, y = residual.d.s.age, color = dataset)) +
  geom_point() +
  geom_smooth(method = "lm") +
  facet_wrap(~ dataset, scales = "free_y") +
  theme_bw() +
  xlab("published year")  +
    theme(legend.position="none")

In some literatures, we see a trend toward looking for the phenomenon in younger children over time.

Does effect size vary as a function of number of authors?

Not really enough data to do this analysis. (maybe cut?)

all_data$short_cite2 = str_replace_all(all_data$short_cite, " ", "")
all_data$short_cite2 = str_replace_all(all_data$short_cite2, "&", ",")
all_data$num_authors = unlist(lapply(lapply(all_data$short_cite2, 
                                     function(x) unlist(strsplit(x, ',+'))), 
                              function(x) length(which(x != ""))))
all_data$num_authors =  as.numeric(ifelse(grepl("et al", all_data$short_cite), "NA",
                                          all_data$num_authors)) # deal with miss coded short_cites (et al.)
all_data$short_cite2 <- NULL

all_data %>%
  ggplot(aes(x = num_authors, y = residual.d.s, colour = dataset)) +
  geom_point() + 
  geom_smooth(method = "lm") +
  facet_wrap(~dataset, scales = "free_y") +
  theme_bw()+
  theme(legend.position="none")

Individual meta-analysis power

What is the power to determine if the overall effect is greater than zero?

Here I calculated the power to detect an effect using a random effect model (Hedges & Pigott, 2001; Valentine, Pigott, & Rothstein, 2010). Note that this information is redundant with confidence intervals on the estimated effect size, and the power in all of these MAs is essentially 1. Below, I ran a simulation sampling n papers from each meta-analysis, and calculating power. We need surprisingly few studies to get relatively high power.

calculate_ma_power <- function(sample_data) {
  ALPHA <-  .05
  NTAILS <-  2
  c_crit <-  qnorm(ALPHA/NTAILS, lower.tail = F)
  model <- rma(d_calc, vi = d_var_calc, data = sample_data,
               method = "REML", control = list(maxiter = 1000, stepadj = 0.5))
  lambda  <-  model$b[1] / model$se[1] # Valentine, Eq. [6]
  power <-  1 - pnorm(c_crit - lambda)
  power
}

one_sample <- function(ma_data, summary_function, n) {
   function(k) {
      do.call(summary_function, list(sample_n(ma_data, size = n, replace = F)))
   }
}

all_samples <- function(ma_data, summary_function, n, nboot) {
  sample_values <- 1:nboot %>%
    map(one_sample(ma_data, summary_function, n)) %>%
    unlist()
  data.frame(n = n,
             mean = mean(sample_values),
             ci_lower = ci_lower(sample_values),
             ci_upper = ci_upper(sample_values),
             row.names = NULL)
}

dataset_data <- function(ma_data, summary_function, nboot) {
  overall <- do.call(summary_function, list(ma_data))
  seq(1, nrow(ma_data), by = 1) %>%
    map(function(n) all_samples(ma_data, 
                                summary_function, n, nboot)) %>%
    bind_rows() %>%
    mutate(dataset = unique(ma_data$dataset),
           overall_est = overall)
}

power_data <- all_data %>%
  split(.$dataset) %>%
  map(function(ma_data) dataset_data(ma_data, 
                                     calculate_ma_power, 50)) %>%
  bind_rows()

ggplot(power_data, aes(x = n, y = mean, fill = dataset)) +
  geom_ribbon(aes(ymin = ci_lower, ymax = ci_upper), alpha = 0.5) +
  geom_hline(aes(yintercept = overall_est), linetype = "dashed") +
  geom_line( aes(x = n, y = mean)) +
  facet_wrap(~dataset) +
  xlab("Number of effect sizes") +
  ylab("Meta-analytic power") +
  xlim(0,150) +
  scale_fill_solarized(guide = FALSE) +
  theme_bw() +
  theme(text = element_text(family = "Open Sans"),
        panel.grid.minor = element_blank())

What is the power to determine if there’s a moderator (e.g. age)?

Not totally sure how to do this. Valentine et al. (2001) say it’s difficult without the raw data?

Trim-and-fill

Corrects for selective reporting on the basis of effect size (fewer small effects; Duval & Tweedie, 2000a, 2000b). The routines and plotting tools stem from the R library “metafor” (Viechtbauer).

[TODO: Adapt to Metalab style]

### fit random-effects model and plot

datasets = unique(all_data$short_name)

for (i in 1:length(datasets)) {
  my_data = all_data[all_data$short_name ==  datasets[i],]
  res <- rma(d_calc, d_var_calc, data = my_data)
  taf <- trimfill(res)
  funnel(taf, main = datasets[i])
}