Olympic Medalist Analysis


In my graduate level statistics course at Illinois Tech, we were assigned a project to peform regression analysis on a data set of our choice. Me and two others found some olympic maedalist data on Kaggle and decided to use this dataset as well as a few others to perfrom regression. Our goal of our project was to first determine the best variables that influence the times in each event for each gender. We also could easily see that times and distance/height improved over the years, but has been slowing down as well. We then had the obvious question of is there a limit for the different events and what is that limit. The first part of this question is expected to be yes, there is a limit since it is first impossible to have a negative score. Therefore, 0 is at least a limit for every event (except distance events). Now it can be argues that a little more than 0 is also impossible for running events, but what is the largest lower limit? Our plan was to create a bunch of different models, and while we also compared the models and their predictive power, we also wanted to see if they reached some limit and if this limit was realistic. We used R for our project, and our code can be found on my GitHub page. Our final project report can be found in the link below.