Gender Bias In Graduate School Admissions - UC Berkeley Dataset

Study Of Simpson's Paradox¶

Dataset: 1973 UC-Berkeley Graduate School Admission Data
What is it? Simpson’s paradox, also called Yule-Simpson effect, in statistics, an effect that occurs when the marginal association between two categorical variables is qualitatively different from the partial association between the same two variables after controlling for one or more other variables. Simpson’s paradox is important for three critical reasons. First, people often expect statistical relationships to be immutable. They often are not. The relationship between two variables might increase, decrease, or even change direction depending on the set of variables being controlled. Second, Simpson’s paradox is not simply an obscure phenomenon of interest only to a small group of statisticians. Simpson’s paradox is actually one of a large class of association paradoxes. Third, Simpson’s paradox reminds researchers that causal inferences, particularly in nonexperimental studies, can be hazardous. Uncontrolled and even unobserved variables that would eliminate or reverse the association observed between two variables might exist.
Background In 1973, the University of California-Berkeley (UC-Berkeley) was sued for sex discrimination. Its admission data showed that men applying to graduate school at UC-Berkley were more likely to be admitted than women. The graduate schools had just accepted 44% of male applicants but only 35% of female applicants. The difference was so great that it was unlikely to be due to chance.
Wiki Information
Project Author: Amitrajit Bose

dataset=[]
with open ('MLTutorial/Udacity/simpsons.txt') as file:
    for line in file:
        dataset.append((line.strip().split(',')))

import pprint
import pandas as pd
category=dataset[0]
data=dataset[1:]
#print(category)
#pprint.pprint(data)
df=pd.DataFrame(data=data, columns=category)
df

maleFemale=(list(df.groupby('Gender')))
maleFemale[1][1]

males=maleFemale[1][1]['Freq'].astype(int).aggregate(sum)
males

2691

maleFemale=(list(df.groupby('Gender')))
maleFemale[0][1]

females=maleFemale[0][1]['Freq'].astype(int).aggregate(sum)
females

1835

(males/(males+females),females/(males+females)) #male female applicant ratio

(0.5945647370746796, 0.4054352629253204)

#department wise statistic
deptStat=list(df.groupby('Dept'))

stat=[]
for i in range(6):
    dr=list(deptStat[i][1].groupby('Gender'))[1][1]['Freq'].astype(int).agg(sum)
    nr=list(list(deptStat[i][1].groupby('Gender'))[1][1].groupby('Admit'))[0][1]['Freq'].astype(int).aggregate(sum)
    maleRatio=round((nr/dr)*100,2)
    dr=list(deptStat[i][1].groupby('Gender'))[0][1]['Freq'].astype(int).agg(sum)
    nr=list(list(deptStat[i][1].groupby('Gender'))[0][1].groupby('Admit'))[0][1]['Freq'].astype(int).aggregate(sum)
    femRatio=round((nr/dr)*100,2)
    stat.append((deptStat[i][0], maleRatio, femRatio))

categ=['Department','Male Acceptance (%)', 'Female Acceptance (%)']
df2=pd.DataFrame(data=stat, columns=categ)
df2

Observations¶

Total male applicants(2691) were much more than total female applicants(1835)
% of male applicants = 59.45
% of female applicants = 40.54
In case of departments A, B, D and F - female acceptance ratio is higher than male acceptance ratio. This proves the presence of Simpson's Paradox.

Conclusion¶

The research paper by Bickel et al. concluded that women tended to apply to competitive departments with low rates of admission even among qualified applicants (such as in the English Department), whereas men tended to apply to less-competitive departments with high rates of admission among the qualified applicants (such as in engineering and chemistry).

	Admit	Gender	Dept	Freq
0	Admitted	Male	A	512
1	Rejected	Male	A	313
2	Admitted	Female	A	89
3	Rejected	Female	A	19
4	Admitted	Male	B	353
5	Rejected	Male	B	207
6	Admitted	Female	B	17
7	Rejected	Female	B	8
8	Admitted	Male	C	120
9	Rejected	Male	C	205
10	Admitted	Female	C	202
11	Rejected	Female	C	391
12	Admitted	Male	D	138
13	Rejected	Male	D	279
14	Admitted	Female	D	131
15	Rejected	Female	D	244
16	Admitted	Male	E	53
17	Rejected	Male	E	138
18	Admitted	Female	E	94
19	Rejected	Female	E	299
20	Admitted	Male	F	22
21	Rejected	Male	F	351
22	Admitted	Female	F	24
23	Rejected	Female	F	317

	Admit	Gender	Dept	Freq
0	Admitted	Male	A	512
1	Rejected	Male	A	313
4	Admitted	Male	B	353
5	Rejected	Male	B	207
8	Admitted	Male	C	120
9	Rejected	Male	C	205
12	Admitted	Male	D	138
13	Rejected	Male	D	279
16	Admitted	Male	E	53
17	Rejected	Male	E	138
20	Admitted	Male	F	22
21	Rejected	Male	F	351

	Admit	Gender	Dept	Freq
2	Admitted	Female	A	89
3	Rejected	Female	A	19
6	Admitted	Female	B	17
7	Rejected	Female	B	8
10	Admitted	Female	C	202
11	Rejected	Female	C	391
14	Admitted	Female	D	131
15	Rejected	Female	D	244
18	Admitted	Female	E	94
19	Rejected	Female	E	299
22	Admitted	Female	F	24
23	Rejected	Female	F	317

	Department	Male Acceptance (%)	Female Acceptance (%)
0	A	62.06	82.41
1	B	63.04	68.00
2	C	36.92	34.06
3	D	33.09	34.93
4	E	27.75	23.92
5	F	5.90	7.04