The Krihelinator
Evaluating trendiness in open source software

Tom Gurion

nagasaki45

tomgurion.me

Github's trending page github trending

Stars

What are we trying to measure?
Are they a good measure of success for open source software?
Can we simply rely on projects with lots of stars?

Trendiness of OSS should be assessed by contribution rate, not by stars
Meir Kriheli

Enters the Github Pulse page PokemonGo-Map pulse page

We can't sort projects by contribution rate...

www.krihelinator.xyz the krihelinator

Krihelimeter

20	* authors
8	* merged and proposed pull requests
8	* new and closed issues
1	* commits

Alternatives

www.openhub.net openhub

Anything else?

Pipeline process

Poller

Poll the Github API to get all of the repositories on Github, in a loop, 100 at a time.

16 Scrapers

Scrape the Pulse page for each repo, extract last week contribution statistics.

Filter

Ignores repositories with only one author.
Ignores repositories with Krihelimeter < 30.

DataHandler

Calculate Krihelimeter and persist to DB.

Periodic process

Kicks in every 6 hours.

Scrapes the Github trending page, pass to the scrapers.

Keeps the 500 repositories with the highest Krihelimeter, deletes the rest.

Rescrapes and updates all of the repositories in the DB.

Simpler concurrency model
+
Fault tolerance
=

Higher level language

Less infrastructure

No job queue, no workers, no celery...

More throughput

~20MB inbound per second
Scraping ~140 pages per second
~40 milliseconds per request

Thanks!

Questions?

The KrihelinatorEvaluating trendiness in open source software