Sleep Matters 😴
Survey Design and Statistical Analysis on Sleep Time of Students
Collaborators:
-
-
This survey was created using Google Forms and can be found here.
-
Final Report in Rmd can be found here. For md version, please click here, and for html version click here.
-
The Question and Hypothesis
Our question is “How does sleeping duration affect student’s MDS performance (grade)?"
- We believe sleeping has a significant effect on students’ grades. Therefore, we would like to test our hypothesis by conducting a survey in which we will ask several questions to find out the validity of our belief.
Survey Analysis
We collected data into the following survey via Google Survey Tool. survey questions and survey data is accessible on following links.
- This survey was created using Google Forms and can be found here.
We applied Ordinal Regression analysis onto our survey outcome which can be found on final report.
Exploratory Data Analysis
Survey Data at a Glance
Bar charts are the simplest way to visualize summary counts for categorical(or ordinal) variables. We used the bar plots to visually represent the summary counts for each variable and not going back and forth with the summary tables to understand the variable distributions.
Correlation Matrix - Test of Association between Variables
- For categorical variables, the concept of correlation can be understood in terms of significance test and effect size (strength of association).
- The measure of association indicates the strength of the relationship, whether, weak or strong but it does not indicate causality.
- Exercise has a weak negative correlation with stress and commute_time with correlation coefficient rho as -0.3.
- Grades have weak negative correlation with stress with rho as -0.3.
- Another interesting observation we found was stress and going out and exercise are negatively correlated with rho as -0.5.
- Meal has very weak association with most variables.
Model Selection
Since our response variable grade
is an ordinal variable, we choose the cumulative link models (CLMs) to fit our data. The clm
function in the ordinal
package fits cumulative link models (CLMs) such as the proportional odds model, which is suitable for an ordinal response. We used this function and specified the link function as logit
.
Results and Discussion
Results
In summary, sleeping for 6 - 7 hours
, instead of less than 5 hours, will increase the log odds of getting a better grade; sleep for 8 - 9 hours
, instead of 6 - 7 hours, will decrease the log odds of getting a better grade. Therefore, 6 - 7
hours seems the best sleep duration for a better grade. Too little sleep may affect the efficiency of study, and too much sleep will take away the study time.
Discussion
While creating the whole survey design and analysis, one of the pressing issues was having few samples in some of the categories in a few questions. It lowers the chances of making use of these variables efficiently. For example, we could not make use of the smoking
variable since we had only one smoker in the survey participants. Therefore, the sample size is very important as we have 10 questions with multiple options. We could have given a subset of these questions to survey participants to have a more robust survey design. In the beginning, we thought it would be easier to have many questions, and we can eliminate them accordingly but later we saw that having several questions would not necessarily be the best approach for a small sample. Also for some variables it would have been better to collect the raw values from the participants instead to making fixed categories in order to get continuous variables. Having all categorical variables sure made our analysis very difficult in terms of visualization, fitting a more interpretable regression model and even doing model diagnostics. The CLM model however cannot be deemed entirely inconclusive and it did show the effect of sleep on grades which was inline with our initial belief of getting a sound sleep for better performance.
Future Study Design Options and Improvements
Our biggest challenge has been using only categorical (and/or ordinal) variables. It is harder to interpret generalized regressions models, in our case, we have used ordinal regression model (cumulative link model) as we have ordinal response variable (grade
). It could have been simpler we could have collect grades from students in a continuous manner. In this case, it could have allowed us to make use of linear models and as we know linear models are simple but have the power of interpretability. However, in our case, it was not an option, since we had a small number of participants, as our questions specifically addressed Master of Data Science students. It could have been problematic to ask people’s grade due to identifiability and personal information issues. One future study option would be aiming college students in general to collect more samples and generalizing the survey design. It will allow the future surveyor to be able to use a finer grade scale and higher power for analysis.
Running the Analysis
Clone the repo and follow the instructions: SleepMatters Project Repository