Your browser doesn't support the features required by impress.js, so you are presented with a simplified version of this presentation.
For the best experience please use the latest Chrome, Safari or Firefox browser.
The Krihelinator
Evaluating trendiness in open source software
Tom Gurion
Stars
- What are we trying to measure?
- Are they a good measure of success for open source software?
- Can we simply rely on projects with lots of stars?
|
Trendiness of OSS should be assessed by contribution rate, not by stars
Meir Kriheli
|
Enters the Github Pulse page
We can't sort projects by contribution rate...
Krihelimeter
20 |
* authors |
8 |
* merged and proposed pull requests |
8 |
* new and closed issues |
1 |
* commits |
Pipeline process
Poller
Poll the Github API to get all of the repositories on Github, in a
loop, 100 at a time.
16 Scrapers
Scrape the Pulse page for each repo, extract last week contribution
statistics.
Filter
- Ignores repositories with only one author.
- Ignores repositories with Krihelimeter < 30.
DataHandler
Calculate Krihelimeter and persist to DB.
Periodic process
Kicks in every 6 hours.
Scrapes the Github trending page, pass to the scrapers.
Keeps the 500 repositories with the highest Krihelimeter, deletes the rest.
Rescrapes and updates all of the repositories in the DB.
Simpler concurrency model
+
Fault tolerance
=
Higher level language
Less infrastructure
No job queue, no workers, no celery...
More throughput
- ~20MB inbound per second
- Scraping ~140 pages per second
- ~40 milliseconds per request