Yayasan Cipta Mandiri (YCM) is an Independent Creative Foundation for disadvantaged children and youth in Bogor, West Java, Indonesia. Rather than being a traditional academic school, YCM is a rumah pembinaan — a house in which students are able to build their self-confidence, general knowledge, and practical skills.
YCM is for children and youth aged from 9 to 23 who come from underprivileged backgrounds but are motivated to expand their knowledge and to exceed expectations. YCM began as a small facility for only a few children but today, with the invaluable support of sponsors, the foundation has grown to accommodate around 100 active students. YCM is housed in a two storey building which has classrooms, two computer rooms, two sewing rooms, and a kitchen.
YCM was established in 2002 by Mrs. Gesine Nitzschke and Ms. Putu Ayu Novitry Ariany, who shared a common vision for the foundation. The foundation is based on the values of mutual trust, responsibility, teamwork, creativity, and active input. YCM aims to maintain its current high standards, develop its internal systems, and build the capacity of its staff. The foundation is funded entirely through private sponsorship and donations.
YCM has two types of classes which are English Based Class and Practical Based Class. In English Based Class, they teach English and character building skills, such as critical thinking, independence, confidence, and team-work. In Practical Based Class, they teach IT, Multimedia, Tourism and Guiding, and Sewing.
Students can choose more than one of Practical Based Class if they want. But, English Based Class. They cannot choose more than one. They cannot even choose. The core team choose them which class will fit the students. New students will be tested how excellent their English is before they enter the class.
There are eight level of class for English Based Class, which are Diligent, Empathy, Integrity, Responsible, Creativity, Tolerance, Honesty, and Enthusiastic. Class names are chosen by YCM's philosophy.
After observing a lot of students, I find some students with good English in the lower class and some students who can develop more in the higher class. I also noticed that there are classes with high variance of age and classes with low variance of age. Based on that issue, I am thrilled to research the issue.
I used questionnaires to get the English score and the age of the students. There are 16 questions inspired by school test for grade 4 to 6 elementary school.
The day I do this research, I can only access the age and the English score. From that, the scope of the research is three variable which are age, english score, and level of class.
There are several variable I want to include, such as confident level, comprehension, and communication level. There are also limitations from this research that don't make that variables happen, such as time, and human resources.
Eventually, This is the result of my research.
import pandas as pd
pd.set_option('display.max_columns', None)
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
import statsmodels.api as sm
def display(w=16,h=4):
plt.figure(figsize=(w, h),
dpi=150)
dataset = pd.read_csv('dataset_label.csv')
dataset.head()
Class | Name | Age | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | Score | Level | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Diligent | nauffal nur rizky | 17 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 15 | 8 |
1 | Diligent | muhammad amarif puja adiria | 18 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 14 | 8 |
2 | Tolerance | adela dzakira aftani | 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 3 |
3 | Honesty | aretha deandra yusuf | 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 2 |
4 | Honesty | najla sabria rachmat | 11 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 3 | 2 |
df = dataset[['Class', 'Age', 'Score', 'Level']]
describe = df.describe().T
display(h=1)
sns.heatmap(describe, annot=True, cmap='viridis')
plt.yticks(rotation=0)
plt.title('Description of The Data')
plt.show()
There are 58 respondents. The age are between 9 to 23 years old with average 13 years old and spread 3.3 from the mean with right skew. The Score are between 0 to 15 correct answers with average 5.4 correct answers and spread 3.9 from the mean with right skew. The Level are between 1 to 8 English level with average 3.8 or 4 and spread 2.1 from the mean with still right skew.
From this, I can assume that there is an explanation the skewness from age and english score bring skewness of level class.
sns.displot(data=df, x='Age', kind='kde', height=3, aspect=(5/1))
plt.title('Skewness of Age')
plt.show()
sns.displot(data=df, x='Score', kind='kde', height=3, aspect=(5/1))
plt.title('Skewness of English Score')
plt.show()
sns.displot(data=df, x='Level', kind='kde', height=3, aspect=(5/1))
plt.title('Skewness of Level of Class')
plt.show()
display(h=6)
sns.barplot(data=df, x=df.index, y='Age')
plt.title('Age Each Students')
plt.xlabel('Students')
plt.ylabel('Age')
plt.show()
There are some students looks much older than average students. The range of students is 3 years younger than average students and 3 years older than average students.
display(h=6)
sns.barplot(data=df, x=df.index, y='Score')
plt.title('Score Each Respondenst')
plt.xlabel('Respondents')
plt.ylabel('Score')
plt.show()
There are wide range for English score for the students. That means the students with similar age give different answer for the question.
order_list = ['Diligent', 'Empathy', 'Integrity', 'Responsible', 'Creativity', 'Tolerance', 'Honesty', 'Enthusiastic']
display()
sns.boxplot(df, x='Class', y='Age', order=order_list)
plt.title('Boxplot Age Variable')
plt.show()
Boxplot figure tells us the summary of the variety among the dataset. This boxplot tells us the variability of age from every class.
The lower and the upper line tells the minimum and the maximum age. The colored boxes tells where the age of 25% and the 75% students are. The line inside the box tells the area for the age of the 50% students. The black diamond tells the different age from others.
display()
sns.boxplot(df, x='Class', y='Score', order=order_list)
plt.title('Boxplot English Score Variable')
plt.show()
The lower and the upper line tells the minimum and the maximum english score every class. The colored boxes tells 25% and the 75% students reach the english score. The line inside the box tells how many english score of 50% students get. The black diamond tells the different english score from others.
corr = df[['Age', 'Score', 'Level']].corr().round(3)
display()
sns.heatmap(corr, annot=True)
plt.title('Correlation Between Age, English Score, and Level Class')
plt.yticks(rotation=0)
plt.show()
Correlation tells how a variable relates to other variables. The scale of correlation is from -1 to 1. The more it gets closer to 1 or -1, the more they can explain each other. The more it gets closer to 0, the more they don't have any relation each other.
display(h=6)
sns.regplot(data=df, x='Age', y='Score')
plt.title('Regression Line Variable Age VS Score')
plt.show()
Regression tells us whether or not one variable has strong relationship to other variable. From the figure above, it tells us that the older you are, the higher your English score is.
x = df[['Age', 'Score']].values
y = df['Level']
x = sm.add_constant(x)
variable = ['CONST','Age', 'Score']
model = sm.OLS(exog=x, endog=y).fit()
print(model.summary(xname=variable))
OLS Regression Results ============================================================================== Dep. Variable: Level R-squared: 0.781 Model: OLS Adj. R-squared: 0.773 Method: Least Squares F-statistic: 98.20 Date: Wed, 21 Jun 2023 Prob (F-statistic): 7.08e-19 Time: 22:19:28 Log-Likelihood: -81.012 No. Observations: 58 AIC: 168.0 Df Residuals: 55 BIC: 174.2 Df Model: 2 Covariance Type: nonrobust ============================================================================== coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ CONST -2.2198 0.578 -3.839 0.000 -3.379 -1.061 Age 0.4112 0.057 7.158 0.000 0.296 0.526 Score 0.1684 0.049 3.445 0.001 0.070 0.266 ============================================================================== Omnibus: 0.864 Durbin-Watson: 2.472 Prob(Omnibus): 0.649 Jarque-Bera (JB): 0.957 Skew: -0.242 Prob(JB): 0.620 Kurtosis: 2.597 Cond. No. 63.1 ============================================================================== Notes: [1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
print(f'the {(1-model.rsquared_adj).round(3)*100} % can be explained by variables I do not research, such as confident level and comprehension.')
the 22.7 % can be explained by variables I do not research, such as confident level and comprehension.
There are lots of stuff going on here. The dependend variable is level of class. The method I used is Ordinal Least Squares. This data is processed on Wed, 21 Jun 2023 at 21:09:20. There are 58 observations and the degree of freedom is 55. There are two independent variables. I use adjusted R squared with score 0.773 because of using more than one variable.
The variable age has a significant effect on level of class, proven with a value of 0.000 on P>|t| which is lower than 0.05. The variable English score has a significant effect on level of class, proven with a value of 0.001 on P>|t| which is lower than 0.05.
The coefficient of age is 0.4112 which means every 1 point of age will affect level of class around 0.4112. The coefficient of English score is 0.1684 which means every 1 point of English score will affect level of class around 0.1684.
The level of class can be explained by age and English score about 77.3% and 22.7% can be explained by variables I do not research. Age and English score have significant effect on level of class.
This research is not perfect. There is still space to improve the research. Adding more respondents or more variable can be one of them.
This research is aimed to give a global view of the students. There are some part I want to talk more like the variety of English score. Some classes has similar students and some has huge variety.
I am not alone doing it. A lot of hands make this happen.
Thanks to Pak Kohar that lets me do this awesome project with students. Thanks to Kak Ono that helps me explain to core team and other tutors about my project. Thanks to Ani that always inspires me. Let's make bigger project next time. Thanks to Naufal, Sinta, and Erik that motivates me to finish this project. For Sinta, your EXCEL is EXCEL(lent). Thanks to all students that gives me time and efford to fill the questionnaires.