This post originally appeared on the Software Carpentry website.
For the last three years,
I've been storing information about instructors, workshops, and other things
in a small SQLite database
so that I can look things up and generate statistics when I need to.
I can't publish it,
since it contains personal identifying information,
but since I had to write a script to migrate the data to
the tool we're building to manage workshops,
it only took another few minutes to create a partly-redacted version of the data.
("Partly" because someone who was really keen could work backward workshop URLs to instructors' names,
cross-reference,
and recover the names of some fraction of our instructors.
Since that information is all public anyway,
though,
I don't think I've introduced any new risks.)
The SQL source for the database is here;
with it,
you can regenerate the database using:
$ sqlite3 swc.db < swc-db-2014-12-14.sql
You can then ask lots of questions—some examples are included below.
If you'd like a little end-of-year procrastination,
what else can you find in this data that's interesting?
-- How many sites have had how many events?
select count(*), c
from (select count(*) as c
from site join event
on site.id=event.site_id
group by site.id)
group by c
order by c desc;
| number | count |
|---|
| 2 | 8 |
| 2 | 7 |
| 2 | 6 |
| 3 | 5 |
| 4 | 4 |
| 8 | 3 |
| 36 | 2 |
| 117 | 1 |
|
-- How many people have taught?
select count(distinct person_id)
from person join task join role
on person.id=task.person_id and task.role_id=role.id
where role.name='instructor';
-- How many people started as learners or helpers and became instructors?
-- How often have people taught?
from (select count(*) as c
from person join task join role
on person.id=task.person_id and task.role_id=role.id
where role.name='instructor'
group by person_id)
group by c
order by c desc;
| # instructors | # workshops |
|---|
| 1 | 43 |
| 1 | 14 |
| 3 | 12 |
| 2 | 11 |
| 2 | 10 |
| 5 | 9 |
| 5 | 8 |
| 4 | 7 |
| 5 | 6 |
| 16 | 5 |
| 17 | 4 |
| 32 | 3 |
| 50 | 2 |
| 138 | 1 |
-- How has each training cohort done?
select cohort.name, count(*), round((100.0 * sum(trainee.complete)) / count(*), 1)
from trainee join cohort
on trainee.cohort_id=cohort.id
where cohort.qualifies
group by cohort_id;
| name | enrolled | completion %age |
|---|
| 2012-08-26-online | 20 | 55.0 |
| 2012-10-11-online | 25 | 44.0 |
| 2013-01-06-online | 12 | 16.7 |
| 2013-03-12-online | 27 | 48.1 |
| 2013-05-12-online | 45 | 26.7 |
| 2013-08-12-online | 41 | 43.9 |
| 2013-09-30-online | 57 | 31.6 |
| 2014-01-16-online | 67 | 22.4 |
| 2014-04-24-online | 58 | 31.0 |
| 2014-04-28-mozilla | 43 | 65.1 |
| 2014-06-05-online | 29 | 10.3 |
| 2014-06-11-online | 59 | 27.1 |
| 2014-09-10-online | 81 | 29.6 |
| 2014-09-22-uva | 31 | 22.6 |
| 2014-10-22-tgac | 41 | 26.8 |
| 2014-11-12-washington | 20 | |
| 2015-01-01-online | 135 | |
| 2015-01-06-ucdavis | 4 | |
| 2015-05-01-online | 113 | |