Computer Science Open Data

By Jeff Huang on 2022-04-13

This is data I wish I had when I was applying for Ph.D. programs. My students and I have slowly put the source data together over time, so that it's now a compilation of computer science data:

The writing on this page is not advice. Rather, I've described on this page what I find most provocative in the data, but you are encouraged to review the data yourself. We have spent hundreds of hours of work putting together the source data, subscribe here for updates.

Analysis of Over 5,000 Computer Science Professors

We host a public dataset comprising profiles of computer science professors in the United States and Canada. The profiles include the names, institution, degrees obtained, subfield, and when they joined the university. The dataset hosted on our platform, Drafty, aims to include all professors who can sole advise computer science students, excluding lecturers, professors of practice, clinical, adjunct, affiliate, or research professors (similar to an older analysis from Jeff Erickson); mainly because we were constrained by time and resources. This analysis uses data collected as of 2022-04-12, so hires in 2020 and 2021 are likely to be grossly underrepresented because of their recency, as this is a publicly-editable dataset; you can help contribute by editing the source directly on Drafty.

Hires among Computer Science Areas

By grouping subfields into four broad categories of Systems, Theory, AI, and Interdisciplinary (this 4-area taxonomy is based on CS Open Rankings), and plotting the professors' Join Year on the x-axis, we can visualize hiring trends.

trends of hiring in the 4 computer science areas from 1990 to 2021

The most clear patterns are that computer science hiring has been growing since 2010, reaching about 250 new computer science professors hired per year among the 102 universities tracked on Drafty. The broad area of Systems has generally been the largest group of hires, but it looks like AI has closed the gap in recent years, and is on track to surpass it. and Theory and Interdisciplinary areas seem to be keeping the gap with Systems in the past decade.

The Doctoral Degrees of Professors

Probably not surprising, 99.9% of professors have a Ph.D. Where did they obtain their bachelors and doctorate degrees? If a graduate from a university get hired as a professor, we call it a "placement".

The below table shows that a quarter of professors come from just four universities: MIT, Berkeley, CMU, and Stanford. The 15 universities with the most placements make up a little over 50% the professors in the United States and Canada.

Doctoral Institution Placements % All Hires
Massachusetts Institute of Technology 410 8.2%
Carnegie Mellon University 318 6.3%
University of California, Berkeley 288 5.7%
Stanford University 257 5.1%
University of Illinois at Urbana-Champaign 202 4.0%
Cornell University 132 2.6%
University of Toronto 125 2.5%
Georgia Institute of Technology 123 2.5%
University of Texas at Austin 120 2.4%
University of Washington 119 2.4%
Princeton University 112 2.2%
University of Maryland 91 1.8%
Harvard University 88 1.8%
University of California, Los Angeles 85 1.7%
University of Wisconsin-Madison 85 1.7%
Purdue University 82 1.6%
University of Pennsylvania 80 1.6%
University of Michigan 73 1.5%
University of Massachusetts Amherst 66 1.3%
Columbia University 64 1.3%
University of Southern California 63 1.3%
University of California, San Diego 61 1.2%
University of Minnesota 58 1.2%
Brown University 54 1.1%
Ohio State University 48 1.0%

Are there universities that hire their own Ph.D. graduates, either immediately or after returning from first working at other institutions? Actually, yes, and at a higher rate than I expected. 7.4% of professors have a Ph.D. from the same university that they are teaching at. MIT in particular has a high rate of self-hires, with 36% of their professors having MIT Ph.D.s themselves.

University Self-Hires % of Faculty
Carnegie Mellon University 50 23.9%
Massachusetts Institute of Technology 44 36.4%
University of Toronto 28 26.7%
University of California, Berkeley 21 21.9%
Stanford University 13 17.6%
Georgia Institute of Technology 12 11.0%
University of Waterloo 12 12.1%
University of Washington 10 9.8%
University of Utah 9 14.5%
University of Illinois at Urbana-Champaign 8 7.8%

A Wider Diversity of Bachelors Degrees

For bachelors degrees, there is a wider range of source institutions, and many international ones. There are several universities from India and China, and it's remarkable that Tsinghua University is where 2.3% computer science professors in the United States and Canada did their bachelors. The 15 universities that grant the most bachelors degrees comprise a little under 25% of professors in our dataset, as opposed to over 50% in the case of doctoral degrees.

Bachelors Institution Placements % All Hires
Massachusetts Institute of Technology 184 3.7%
Tsinghua University 116 2.3%
Harvard University 104 2.1%
University of California, Berkeley 94 1.9%
Indian Institute of Technology Kanpur 80 1.6%
Cornell University 75 1.5%
Stanford University 74 1.5%
Carnegie Mellon University 69 1.4%
Indian Institute of Technology Madras 67 1.3%
Princeton University 64 1.3%
University of Toronto 59 1.2%
Peking University 57 1.1%
University of Illinois at Urbana-Champaign 57 1.1%
Yale University 56 1.1%
Indian Institute of Technology Bombay 49 1.0%
Shanghai Jiao Tong University 48 1.0%
Brown University 47 0.9%
University of Science and Technology of China 46 0.9%
University of Waterloo 44 0.9%
California Institute of Technology 44 0.9%

Students often want to know if getting a Ph.D. from the same institution they did their undergraduate degree will make it harder for them to continue in academia. This turns out to be more common than I expected: 14% of professors have a bachelors degree from the same institution as their doctorate.

Bias in Computer Science Rankings

Rankings are an Ideology

College rankings are being questioned from two sides. On one hand, the ideologies underlying them are being exposed, such as emphasizing selectivity rather than quality of education. Depending on what factors you count, and the weight given to them, college rankings can range from nonsensical to being fabricated to suit a predetermined assumption. But on the other hand, colleges have come up with tricks to game the same factors that are being questioned by others, proving Goodhart's Law. A recent examination of the data Columbia submitted to U.S. News seems to uncover outright deception, but other practices can be subtle, like keeping classes below a certain threshold to count more small classes on paper, or creative ways to define who counts as a student.

Those are university-wide rankings, but similar biases can surface in rankings of computer science programs, which are mainly evaluated for their research. Two popular rankings, U.S. News and csrankings.org use completely different methodologies. U.S. News ranks doctoral programs separately than the university, with computer science rankings "based solely on the results of surveys sent to academic officials", meaning they are entirely reputation based and are not affected by the issues plaguing Columbia's data mentioned earlier. But using data from self-selected survey respondents may lead to other biases, who often base their own judgments on past years' rankings.

On the other hand, csrankings.org is purely generated from publicly-verifiable data. This doesn't mean the data is more correct, but that we can see how things are calculated. For example, the University of Illinois and University of California San Diego have crept from ranks 5 and 11 in 2019 to ranks 2 and 4 today, just 3 years later. Being a top 4 CS department is a big deal. And the way that computer science programs are scored encourages ideologies of a different kind. For example, in csrankings.org, "a single faculty member gets 1/N credit for a paper, where N is the number of authors", but students don't count, so a paper with a single faculty author is worth three papers that have two student co-authors. While this has probably not been attempted, this ideology rewards leaving student co-authors off papers, or hiring more faculty in subfields where solo authored papers are common like in theoretical computing science.

Diversifying the Ranking Sources to Expose the Biases

Let's put U.S. News and csrankings.org* side-by-side and compare, basically creating a meta-ranking. And add two other ranking sources, one is "placement rank" which treats universities as nodes in a graph, with unidirectional links occurring when one university hires a graduate of another university to be a faculty member. A PageRank-style scoring is computed, so that universities whose graduates become faculty get credit for that, with more credit going to placing students at other universities whose students are also sought after. The second ranking source added is a count of best paper awards for each university, with more credit going towards authors listed first. Here's how the resulting four rankings line up.

*The csrankings.org methodology is our attempt to replicate what's described on their website, but may have slightly discrepancies due to data differences

rank university size U.S. News csrankings.org placement rank best paper awards
1 Massachusetts Institute of Technology 106 1 2 1 4
2 Carnegie Mellon University 199 2 1 4 2
3 Stanford University 68 2 4 3 3
4 University of California, Berkeley 83 2 6 2 5
5 University of Illinois at Urbana-Champaign 86 5 3 7 8
6 University of Washington 96 6 8 10 1
7 Cornell University 84 6 7 6 7
8 University of Michigan 92 11 9 14 6
9 Georgia Institute of Technology 99 6 11 13 12
10 University of Texas at Austin 60 9 16 11 10
11 University of California, San Diego 91 11 5 26 13
12 Princeton University 56 9 24 8 16
13 University of Maryland 62 17 10 20 15
14 Columbia University 59 11 12 23 18
15 University of Wisconsin-Madison 53 17 16 15 19
16 University of California, Los Angeles 47 11 20 19 24
17 University of Massachusetts Amherst 61 23 20 22 14
18 University of Pennsylvania 62 17 15 16 32
19 Purdue University 71 20 18 21 33
20 Harvard University 37 16 31 5 48
*as of 2022-04-12, with Canadian Universities removed for simpler comparison

Some differences are particular to one or two ranking sources, reflecting their biases:

Each individual ranking makes choices that can be subject to debate. For example, it's clear that multiple ranking sources benefit from larger departments who may publish more in total but its researchers are not individually more productive. Dividing by the department size might be tempting, and would especially benefit my own institution which ranks highly despite its smaller size. Or perhaps allowing only recent data to matter, which is csrankings.org's default setting, whereas best paper award rankings disregards the year.

But by summing the rank from the four sources, we can get a resulting meta-ranking we call CS Open Rankings. The meta-ranking includes the biases from the individual ranking sources, but none of them too strongly, leading to a somewhat more stable ranking.

The conclusion is that individual rankings are unreliable, subject to biases that become clear when compared with other rankings. Each individual ranking can have based on ideologies that are subjective, or sometimes the ranking can have anomalies that are a surprising even when the methodology is known. We haven't yet looked at area-specific rankings (AI, Theory, Systems) and there may be possibly more anomalies to uncover. This exercise is left to the reader, and can be investigated on CS Open Rankings.

Who Wins CS Best Paper Awards?

Most Best Paper Awards Go to A Few Dozen Institutions

Thousands of institutions around the world publish computer science papers. At most of the top conferences, a select few are designated as "best paper" or "distinguished paper" for that year. But just a couple dozen of these institutions get a disproportionate number of awards.

Looking at 30 well-known conferences, the 25 institutions that receive the most awards get 44% of these best paper awards, 552.4 of them. Best paper award credit is divided among institutions in decreasing author order as is customary in many subfields. The top 7 of these win over a quarter of the best paper awards: Microsoft, University of Washington, Carnegie Mellon University, Stanford University, Massachusetts Institute of Technology, University of California Berkeley, and University of Michigan.

Institution Best Papers
Microsoft 62.4
University of Washington 56.9
Carnegie Mellon University 52.2
Stanford University 46.5
Massachusetts Institute of Technology 43.4
University of California, Berkeley 31.5
University of Michigan 22.7
Google 21.3
Cornell University 20.5
University of Illinois at Urbana–Champaign 19.1
University of Toronto 17.3
IBM 15.2
University of Texas at Austin 15.1
University of British Columbia 13.4
Georgia Institute of Technology 12.8
University of California, San Diego 11.5
University of Massachusetts Amherst 11.2
University of Maryland 10.9
University of Oxford 10.6
University of Cambridge 10.6
École Polytechnique Fédérale de Lausanne 10.2
Princeton University 9.8
University of California, Irvine 9.8
Columbia University 8.8
University of Wisconsin–Madison 8.7

Of the 25 institutions who receive the most awards globally, 22 are academic institutions, and 3 are corporations headquartered in the United States (Microsoft, Google, IBM). 17 of the universities are in the United States, 2 are in Canada (Toronto and UBC), 2 in the United Kingdom (Oxford and Cambridge), and the remaining one is in Switzerland (EPFL).

The institutions get more diverse in the top 35 institutions, which represent over half the best paper awards. There are two additional Canadian Universities (Waterloo and McGill), two Chinese Universities (Tsinghua and Peking University), Yahoo and the University of Chicago, both from the United States, and institutions from France, South Korea, Singapore, and Israel (INRIA, KAIST, National University of Singapore, and Technion respectively).

Analysis done on data from 2022-04-12. See the source data in my best paper awards collection.

Verified Computer Science Ph.D. Stipends

Computer Science Stipends have been Rising

Computer science Ph.D. stipends are a bit mysterious. Departments rarely publicly list them on their websites. Maybe it's to avoid the comparison with stipends from other departments, or because they perceive it as unimportant to applicants' decision-making. But this also prevents potential Ph.D. applicants from knowing, and the stipend pay is an important consideration especially for low-income applicants.

Anecdotally, I found that many people have a sense that stipends are lower than what they are now. Partly because stipends have been rising, or because they hear about stipends in other fields, or because they may be thinking of the 9-month stipend as an annual salary. I think having accurate information about CS stipends is a good thing.

I wanted to verify stipends being offered with my own eyes. A number of applicants who had applied to Ph.D. programs for Fall 2022 shared their offer letters containing stipend rates, and I've compiled them together in the table below (last updated 2023-02-17).

Department (Default = Computer Science) 9-month 12-month
Massachusetts Institute of Technology 38,000 50,000
Columbia University 37,000 49,000
Princeton University 36,000 48,000 [1]
Stanford University 36,000 *47,000 [2]
Brown University 33,000 44,000
Johns Hopkins University 32,000 *43,000
Harvard University 32,000 43,000
New York University (Tandon) 32,000 42,000
Tufts University 32,000 *42,000
Northeastern University 31,000 41,000
Cornell University 30,000 *42,000 [3]
California Institute of Technology 30,000 40,000
University of Chicago 30,000 *40,000
University of California, Berkeley 29,000 50,000
Carnegie Mellon University 29,000 38,000
University of Texas, Austin 29,000 *38,000
University of Washington 29,000 *38,000
University of Pennsylvania 29,000 38,000
University of Southern California 29,000 38,000
Yale University 29,000 38,000
University of Michigan (School of Information) 27,000 *36,000
University of Michigan 27,000 *36,000
University of Washington (iSchool) 27,000 *36,000
University of Washington (HCDE) 27,000 *36,000
University of Massachusetts Amherst 26,000 35,000
University of California, San Diego 26,000 *34,000
University of North Carolina at Chapel Hill 25,000 *40,000
Vanderbilt University 24,000 33,000
Georgia Institute of Technology 23,000 *31,000
University of California, Santa Barbara 23,000 *31,000
Stony Brook University 23,000 *34,000
University of Maryland 23,000 *31,000 [1]
University of Texas, Austin (ECE) 22,000 29,000
University of California, Irvine (Informatics) 22,000 *29,000
University of California, San Diego (CogSci) 22,000 *30,000
University of Minnesota 21,000 *27,000
Indiana University Bloomington 20,000 *27,000 [1]
University of Toronto 24,000CAD 32,000CAD [4]
University of Alberta 20,000CAD 27,000CAD
* I did not see evidence that the 3 months of summer salary were guaranteed

[1] Normalized a longer-term academic stipend to a 9-month stipend
[2] A RA position paying 2X in the summer is possible
[3] Offer letter states that this is the "anticipated" stipend
[4] Includes additional pay from two optional TA jobs

Every stipend number shown in the table I checked myself, except for the shaded values where I had to interpolate from the other stipend number (9-month or 12-month). The stipends are for the first year in a Ph.D., as stipends often increase annually or after candidacy, so they are the base minimum stipend that every admit gets. The numbers shown in the table are rounded for reasons that I won't get into, but the table is sorted by the exact 9-month stipend number. If you have a need to see the actual stipend dollar amount, and you can help contribute to the data, then email me to discuss.

I would also caution that the stipend does not include required fees (some public universities had fees up to about $2,500), varying non-covered costs for health insurance for some universities, nor do they include benefits like equipment or travel discretionary funds, or bonuses and top-up fellowships. There's also the factor of cost-of-living, where housing would simply account for much of the stipend.

I've learned a few things during this process. First, CS stipends have gone up. But funding agencies haven't caught up. In fact, the NSF Graduate Research Fellowship provides an annual stipend of $34,000 (which is the same for all disciplines), which is below the 12-month stipend for first year Ph.D. students of nearly every computer science department in the table. Second, private universities generally pay higher stipends. The table below is sorted in descending order of 9-month stipends, so it's clear that all the higher stipends come from private universities; Berkeley has the highest stipend of any public university.

In this series

CS Faculty Composition and Hiring Trends

Bias in Computer Science Rankings

Who Wins CS Best Paper Awards?

Verified Computer Science Ph.D. Stipends

Other articles I've written

Behind the scenes: the struggle for each paper to get published

This page is designed to last, a manifesto for preserving content on the web

Illustrative notes for obsessing over publishing aesthetics

My productivity app is a never-ending .txt file

The Coronavirus pandemic has changed our sleep behavior

Extracting data from tracking devices by going to the cloud