Purpose:
Help People Gain Better Self-Understanding and Empowered Self-Improvement through Data
I track a lot of aspects of my life.
Time & Productivity
- Computing Time (RescueTime)
- Project Time (Toggl)
- Mobile Screen Time (Moment app)
- Tasks (Todoist)
- Habits and Goals (via Streaks, Habitica, Productive)
- Creative Written Words (Mac Word Counter app)
Fitness
- Heart Rate (Apple Watch, Chest Strap)
- Stress / Heart Rate Variability (HRV4Training)
- Running (Strava)
- VO2 Max
- Strength Workouts (FitBod)
- Stretching and Mobility (self-logging)
- Notes (Evernotes)
- Books Read (GoodReads)
- Articles Read (Instapaper, Pocket)
- TV and Movie Watching (Trakt.tv)
- Music (Last.fm)
- YouTube Watching
- Podcast Listening: PodcastTracker.com
Other Tracking Stuff
- Location (Moves app, Reporter App)
Personal Example of Improved Health.
Circa 2015.
Personal Example of Improved Health.
Circa 2017.
My Steps Towards Data-Driven Self-Improvement
- Set a Goal
- Track It
- Research the area.
- Make Lifestyle Changes
- And track those too, i.e. track your commitment and follow-through
- Check-in, Evaluate and Engage with your data
- Repeat
Our Question:
Can self-tracking and personal data help us better understand ourselves?
And can this data help us become more productive, healthier, and happier?
Outline:
- Part 1: QS / Self-Tracking: How to measure a life?
- Part 2: Data Collection / Extraction / Processing of Personal Data with Python and QS Ledger
- Part 3: Data Exploration and Data Viz with Python and Tableau
- Part 4: Machine Learning for QS and Personal Tracking Data <- talk focus
- Conclusion: Tips on How to Become a Data-Driven You
Part 1: How to measure a life?
Quantified Self / Self-Tracking
(def.)
Measuring or documenting something about your self to gain meaning or make improvements.
Why do People Self-Track
Source: Gimpel, Henner, Marcia Nißen, and Roland Görlitz. “Quantifying the Quantified Self: A Study on the Motivations of Patients to Track Their Own Health.” (2013)
www.markwk.com/why-people-self-track.html
Why Track Your Life?: Benefits of Self-Tracking
- Improved Health.
- Better Time Management
- Augment your memory.
- Save and better invest your money
- Achieve goals. Support habits. Manage projects
- Understand your mood, energy level and stress.
- Curiosity? Learn stuff about yourself.
- Personal Data is the Future.
Popular Forms of Tracking
- Weight
- Mood
- Wearables, i.e. steps, HR, sleep, etc.
- Heart Rate: one in five Americans own a heart rate sensor today
- Time Tracking
- Calendar, Project management and Tasks
- Fitness and sports
- Media Consumption: TV, music, articles, books...
- Others: Money, Blood, DNS, Microbiome,
Opportunities
in the Tracking and Personal Data Space
- Enabling and tracking new data points
=> New sensors, cheaper testing, new tracking apps, etc.
- Deriving insight and meaning from existing data
=> More data and data accessibility, improved data science and machine learning, accessible ml/ai services, etc.
My Contributions and Work
Enabling and tracking new data points
- PodcastTracker.com
- PhotoStats.io
- BioMarkerTracker.com
Deriving insight and meaning from existing data
- Quantified Self (QS) Ledger
PhotoStats.io
mobile app for iOS and Android that tracks and auto-tags your photo library, helping you understand and find your photos.
Tip:
Start with a Question
Track It!
Why Python for QS and Self-Tracking?
- Python is a General Purpose Scripting Language
- Existing integrations to many services and APIs
- Robust Data Science Toolkit for data visualization and analysis with Pandas and Matlab
- Machine Learning, Modeling and Forecasting
- Great to explore and use locally, Potential to extend for external usage.
- Python provides a great way to get your data, explore it and build with it.
Part 2: Data Collection, Extraction, and Processing of Personal Data
Problems for Trackers:
- Silo-ed or Fragmented Data, i.e. collecting tracked data?
- Lack of Data Engagement, i.e. using that data
- Lack of rigorous data science, statistical analysis, and machine learning
Quantified Self (QS) Ledger
A Personal Data Aggregator and Dashboard for Self-Trackers and Quantified Self Enthusiasts
github.com/markwk/qs_ledger
Goal #1: Download and Process Personal Data
- Pull Data from Tracking Services' APIs
- Split-Transform-Combine / Extract-Transform-Load (ETL)
Examples of Data Collection from APIs:
Work-in-Progress: Evernote, YouTube History, Pocket, Instapaper, Google Fit... and others.
Examples of Data Collection and Processing from Manual Data:
- Apple Health
- Kindle Highlights
Work-in-Progress: HRV4Training, Email Analysis... and others.
Examples of Raw Data Processing:
- Extracting and Manipulating Date Fields.
- We need a unified (time series) reference.
- We sometimes need to fix timezones.
Kindle Highlights: Raw Data
![]()
Kindle Highlights: Date-Time Processing
from datetime import date, datetime as dt, timedelta as td
# date additions
my_clippings['timestamp'] = pd.to_datetime(my_clippings['timestamp'])
my_clippings['date'] = my_clippings['timestamp'].apply(lambda x: x.strftime('%Y-%m-%d')) # note: not very efficient
my_clippings['year'] = my_clippings['timestamp'].dt.year
my_clippings['month'] = my_clippings['timestamp'].dt.month
my_clippings['mnth_yr'] = my_clippings['timestamp'].apply(lambda x: x.strftime('%Y-%m')) # note: not very efficient
my_clippings['day'] = my_clippings['timestamp'].dt.day
my_clippings['dow'] = my_clippings['timestamp'].dt.weekday
my_clippings['hour'] = my_clippings['timestamp'].dt.hour
Kindle Highlights: Month Counts
![]()
Example Data Visualization: Kindle Highlights Per Month
![]()
Example Data Visualization: Kindle Highlights Per Month
![]()
Aggregating Raw Data into Daily Counts
![]()
Example: Apple Health
Split-Apply-Combine
- Step 1: Split the data into groups by creating a groupby object from the original DataFrame
- Step 2: apply a function like an aggregation, count or sum
- Step 3: Combine the results into a new DataFrame.
Ref: pandas.pydata.org/pandas-docs/stable/groupby.html
RescueTime: Split-Apply-Combine
![]()
Step 0: Raw Data from RescueTime API
RescueTime: Split-Apply-Combine
![]()
Step 1: Process Dates and Datetime from RescueTime API
Step 2: Groupby
total_by_date_productivity =
activities.groupby(['JustDate',
'Productive'])['Seconds'].sum()
.reset_index(name='Seconds')
total_by_date_productivity['Minutes'] =
round((total_by_date_productivity['Seconds'] / 60), 2)
Step 3: Pivot Table and Sum
table = total_by_date_productivity.
pivot_table(index='JustDate',
columns='Productive',
values='Seconds', aggfunc=np.sum)
Results: Daily Productivity Time from RescueTime:
![]()
Challenge is data is everywhere, but how can we bring it together towards a....
Convergence of Data?
Goal #1b: Process and Combine
Combining from Multiple Sources
![]()
Step 1: importing multiple data sources
Combining from Multiple Sources
![]()
Step 2: count of total datapoints, i.e. days, by type
Combining from Multiple Sources
![]()
- Step 3: Merging Multiple DataFrames Together
Tip:
Automate Your Data Collection into Google Sheets with IFTTT
IFTTT is a free integration service. It connects with most tracking services. Google Sheets, while limited, provides a robust first place to store your personal data.
Part 3: Data Exploration and Data Viz
(or Building a Personal Data Dashboard)
Why Visualize and Engage with Your Data?
Personal data provides objective feedback, like a scorecard.
But data is most effective when it's in context, tells a story, and is visual!
Goal #2: Explore and Visualize Personal Data
- Create a Personal Data Dashboard
- Options: Google Sheets, Tableau, Excel, R, Python...
Google Sheets
![]()
Tableau
![]()
- Business Intelligence (BI) tool that focuses on data visualization, dashboarding and data discovery.
- Helps you explore your data, tell stories with data, and produce interactive data visualizations.
Dashboard of
Weekly / Monthly Trends
Exploring Data and Correlations
Tableau Correlation Explorer
![]()
Tableau Correlation Explorer
![]()
Tip:
Engage with your Data
Whether you use Tableau, Google Sheets, Google Data Studio, Python or whatever, a personal data dashboard is one of the best ways to transform tracking data into something useful and engaging.
Part 4: Machine Learning for QS and Personal Tracking Data
What is Machine Learning?
(def.)
Arthur Samuel (1959):
field of study that gives computers the ability to learn without being explicitly programmed
What is Machine Learning?
(def.)
Tom Mitchell (1998):
"A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E."
Simple Definition of Machine Learning:
Utilize past experience to learn from it and use its knowledge to make future decisions.
Three Broad Categories of ML
Supervised learning: Given a desired output and training data, the goal is to learn a general rule that maps inputs to outputs.
Unsupervised learning: No labels or output is unknown. The goal is to discover pattern or structure in the data (ex. feature learning).
- Reinforcement learning: The program is provided feedback in terms of rewards and punishments as it navigates the problem space towards a goal.
Regression & Classification Models
- Regression models (both linear and non-linear) are used for predicting a real, continuous value, like salary, profit, stock price, etc.
- Classification is used to predict a category, like cancer, traveling, step, etc.
Common Purposes of Machine Learning
- Pre-processing, i.e. data mining, including smoothing, outlier detection, feature detection and generation
- Prediction
- Clustering
- => derive recommendations, inform and automate decisions, inform stakeholders
Some Examples of ML and AI
- Email: Spam Prevention, Automated Replies
- Computer Vision: Image Classification, Auto-Tagging, Facebook's Facial Recognition
- NLP: Siri, Google Assistant, Google Translate
- Finance: Fraud Detection, Loans, Trading, Risk
- Healthcare: Radiology, Computer Assisted Diagnosis (CAD)
- Personalization in Media and Retail: Netflix, Spotify, Facebook, Twitter, YouTube, Amazon, Advertising
- Transportation: Uber, Logistics, Autonomous, Self-Driving Cars
Steps in ML Model Development and Deployment
- Collection of Data
- Data Preparation, i.e. outliers and missing values
- Data Analysis and Feature Engineering
- Training Algorithm, Validating data
- Deploying to Production
Machine Learning and Quantified Self
In QS, we are typically answering n=1 questions like "does x affect y". So basic statistical methods (Frequentist or Bayesian) are often sufficient.
Where is ML used in QS Space:
- Processing sensor data to signals
(i.e. counting steps, detecting sleep, determining HR / HRV)
- Learning based on sensory data
(i.e. Clustering, Supervised Learning, Predictive Modeling with or without notion of Time, Reinforcement Learning to Provide Feedback and Support)
- Decision Making based on Data and Learning
Processing sensor data to signals (i.e. features)
- Example, take sensor data from a phone or smartwatch, remove noise, remove outliers, smooth data. Eventually towards the goal of automatically labeling the activity (i.e. walking, sleeping, running, etc.)
Accuracy of Classification (Labeling Activity)
![]()
Classification and Feature Importance
![]()
Learning based on sensory data
Statistical Analysis on QS Data
Example of Examining Project Work Minutes
Statistical Analysis on QS Data
![]()
- Examining and Targeting: Active Project Time
Statistical Analysis on QS Data
![]()
- Using all the processed and pre-selected variables
Backward Elimination with p-values and Adjusted R Squared:
![]()
Backward Elimination with p-values and Adjusted R Squared:
![]()
- Relevant Variables: 'Steps', 'KindleHighlights', 'sleep_quality', 'sleep_minutes', 'Photos', 'Songs', 'RunningClimb', 'Traveling', 'sickness', 'RunningMinutes'
What Can Statistical Analysis Tell Us?
- Relevant factors in your life that affect others.
- Example: Steps, Kindle Highlights, Sleep, Song Listening, Running and Sickness all affect the amount of my productive project time.
EXAMPLE: Was I Traveling?
Classification Challenge
Classification with Statistical Methods:
- Logistic Regression
- K-Nearest Neighbor
- SVM and Kernel SVM
- Naive Bayes
- Decision Tree
- Random Forest
Import Data
![]()
Process Data: Split and Feature Scaling
![]()
Classification with Deep Learning
EXAMPLE: Traveling
Artificial Neural Network
Am I Traveling?
Classification Results Summary:
- Feature Selection and feature engineering are key.
- Good accuracy with statistical methods but slight advantage with ANN.
- Need to modify for temporality considerations (data isn't isolation but part of trends)
- Future usages: healthy day? productive day? study day? social day? etc.
Predicting My Project Time
Regression Challenge
Decision Making based on Data and Learning
Recommendations + Advice: What should I be doing to be X (healthy, productive, advance goals, etc.)?
A Few Existing Products:
- Run Training (TrainAsOne.com)
- Strength Training (FitBod App)
Machine Learning and Quantified Self
Conclusion:
Tips on How to Become a Data-Driven You
What Should You Track?
Four Essential Areas Everyone Should Track
- Health
- Time
- Goals, Projects and Tasks
- Money
How to Track?
A Couple Recommendations
- Health: Baseline: Blood Tests + Sleep (and maybe HRV)
- Time: RescueTime
- Money: Mint.com
- Goals, Projects and Tasks: Todoist
"Deep Work"
vs. Distractions, email, etc.
"Deep Work: Professional activities performed in a state of distraction-free concentration that push your cognitive capabilities to their limit. These efforts create new value, improve your skill, and are hard to replicate."
Are you doing deep work?
- Are you getting deep focused work done each and every work day? Or is most of your day sucked in administrative tasks?
- Aim for 3-4 hours of uninterrupted deep work session per day. But maybe start with 1-2 hours.
QUESTION: Can personal data enable self-improvement?
Yes
But you need to engage with your data. You need to use your data, think with it, and leverage it to support your goal
Summary
- It's easier than ever to track our lives.
- Some personal data can be more significant than others.
- Tip #1: Start with a question, then track it.
- Tip #2: Track your time, get a heath check-up with blood testing, and find a way to quantify your work and projects.
- Tip #3: Engage with your data.
- Python can help bring your data together, process it, explore it, visualize it, tease out relationships, and even model and forecast it.
Personal data and self-tracking are an opportunity for understanding ourselves (and for understanding our relationship with technology).
Self-tracking can enable and empower data-driven self-improvement and even a new data-driven you.
Best of luck and happy tracking!
How did I get started tracking?
- While I have kept a journal for a long time, I first started tracking with a Fitbit.
- I then started using time tracking tools like RescueTime and Toggl.
- I currently track a dozen or so things.
How Often Do I Look at My Data?
- Mostly use passive tracking, excluding daily check of HRV and Sleep.
- Generally once a week, I do a data-driven weekly review.
- Sometimes more frequently depending on the experiment or goal I'm working on.
Sleep: Are you getting enough?
![]()
How much does sleep affect our health and creativity? A LOT.
"Time Well-Spent"
tech companies strive to own our attention.
"With Time Well Spent, we want technology that cares about helping us spend our time, and our lives, well– not seducing us into the most screen time, always-on interruptions or distractions."
- Tristan Harris, humanetech.com.
Less Phone Time
![]()
Previously, I was spending between 15-30 hours a week. Now it's around 5-10. That's a savings of 5-20 hours.
Youtube History www.markwk.com/youtube-tracking.html
Youtube Time:
Question: How much time do you spend on ____?
Timezone Conversion Functions
Timezone Conversion Example: Running Workouts
Python For Self-Trackers:
How To Become A Data-Driven You
by Mark Koester
www.markwk.com | github.com/markwk
Slides: github.com/markwk/python4selftrackers