In the final project, We hope everyone can think yourself as a real-world data scientist. Your goal is to come up with some interesting questions, find right datasets, and implement a data-processing pipeline to answer those questions. In order to achieve this, please follow the following steps:
The following table summarizes the TODO list of the final project.
ID | |||
---|---|---|---|
1 | Initial Plan | Sunday 03/05 at 11:59 PM | Submit the filled form to the CourSys activity Initial Plan |
2 | Proposal Presentation | Monday 03/13 at 08:00 AM Monday 03/13 at 09:30 AM |
Submit your slides to the CourSys activity Presentation Give a talk in TASC 1 9204 West |
3 | Poster Session | Monday 04/10 at 08:00 AM Monday 04/10 at 10:00 AM |
Submit your poster to the CourSys activity Poster Session Present your post at the SFU Big Data Hub |
4 | Code & Report | Sunday 04/16 at 11:59 PM | Code, Report |
The first thing you need to do is to make a plan. Find the right person(s) that you want to work with and come up with a good project topic. Here are some requirments (or hints) about the topic:
Submission
The Initial Plan that you made above would be meaningless if you couldn't persuade your manager to allow you to do it. Thus, at work, it is super important to know how to give a persuasive speech. In order to train you this skill, each group needs to give a 5min talk on the proposed project. Imagine your manager is sitting in the audience, your goal is to convince him/her that:
You can easily find a lot of good tips on how to give a persuasive speech. Take a look at them and try to apply them to your speech.
Requirements
EVERYONE should get prepared for the speech. In class, we will randomly pick up a student from each team and ask him/her to give the talk.
The length of the talk should be 5 mins! Pay attention to the time. If you ended up spending x mins, your team grade would be deducted by |x-5|
points (rounding).
You are going to give the presention at 09:30 AM on Monday 03/13. But, please upload your slides to the CourSys activity before 08:00 AM.
This is the show time! Make a poster to present your data product. Here are a few things that you can put into the poster:
What questions do you try to answer?
What's your methodology to get the answers?
What datasets/tools do you use?
What's your data processing pipeline like?
What's your data product?
What have you learnt through the project?
There should be tables available if you would like to do a demo.
Submission
Source Control
Like CMPT 732, you must use a Git repository for your project. The department's GitLab server is a good way to get one (instructions at that link). Group members must commit their own contributions to the repo. Please give the instructors and TAs (jnwang, aguha, sjishan, zcong) developer access to your repository. You are encouraged to publicize and open-source your work on GitHub or similar.
Code Submission
The final implementation is due Sunday 04/16 at 11:59 PM. You will submit a tag from your repository (git tag final; git push --tags) to the CourSys activity Code. In your repository, please include a file README.txt (or README.md if you prefer) indicating how we can actually test your project as well as other notes about things we should look for. If you created some kind of web frontend, please include a URL in the README.md as well.
Report Submission
You will submit a report of at most 5 pages giving an overview of your project.
This is also due Sunday 04/16 at 11:59 PM, submitted to the CourSys activity Report as a PDF.