Find a dataset that is satisfactory (it is more than 8 features i.e. columns, and it has at least 100 data points for each feature)
Upload your data set to github and then to python (Colab or Jupyter)
Explain your data using textboxes (make sure to explain everything that you are doing - do some research to become a "expert", so that you know what is meaningful with this data). Present two questions that you are going to explore in your data. Propose a null hypothesis and an alternative hypothesis. Your research should include some potential insight into your question and its solution. You should explain with your expert knowledge each of the features of your data (in depth). You should explain whether each feature is numeric or categorical (and which type within these category). You should include the range in your data (and if necessary the expected range).
Analyze your data for missing data or errors, etc. Use label encoding or getdummies if necessary. All Your code should be fully commented, and you should break it up with text boxes explaining the ideas. Change the data type if applicable.
Impute the data
Get the statistics and a basic analysis of what is going on
Do some charts, correlations, etc.
Answer your 2 questions with your data using Python to provide a solution that you explain with text boxes as well as commented code
Do regression to test whether one of your variables can be explained via this model.
Evaluate your model.