Why do software developers love their programming language?

09 Aug 2023 - 17:03 | Version 3 | UnknownUser

An interview-based exploratory study aiming to understand the landscape of language likes and dislikes better.

Bachelor thesis done 2018 by Lukas Hoffmann, advised by Lutz Prechelt.

1. Introduction
2. Approach
3. Design considerations
4. Interview questions
Work plan and progress

1. Introduction

Although language zealotry has maybe become a little less pronounced in the past few years, we still perceive most professional software developers to like some language(s) much more than others.

Why is this so?
Are there recurring elements in those reasons?
Which of those are (apparently) rational, which are emotional?
How rigid (or flexible) do these attitudes appear to be?

Answers to these questions might be helpful in many ways in some circumstances:

More rational language choice (because of accumulation of evidence (if implicit and subjective) of strengths and problem areas)
Better team cohesion (by recognizing or developing shared values)
Better choice of down-payments of technical debt (by recognizing typical sources of pain)
More such down-payments (from recognizing the strength of the pain)
???

2. Approach

We rely mostly on interviews, because this appears to be the only feasible approach for a Bachelor thesis: Direct observation would be too time-consuming; relying on existing materials (bug trackers, blogs etc.) does not allow to tailor the data to the question.

We may ask respondents to show us specific code examples of the phenomena they discuss. The examples -- or a respondent's difficulty in finding them -- may be helpful to understand the interview statements better or may serve as additional evidence.

3. Design considerations

3.1. Respondent selection

We are interested in developers in professional contexts only, but will consider a broad range of experience levels, from still-students to twenty-plus year old-timers. We will require respondents to have substantial experience with at least two languages, called primary and secondary; see below.

3.2. Language selection

In order to get concrete answers (rather than abstract ramblings), we will mostly ask about specific languages, not languages in general. We ask each respondent to talk about and compare two languages:

The primary language is the language the respondent uses the most in their professional work since at least 12 months.
The secondary language is a language the respondent has used extensively in the past or is using currently in the same professional role or some other role (e.g. hobby open-source development) and that s/he likes considerably more or less than the primary language.

Important: We are not interested in language dislikes that lack an underpinning of substantial personal experience with that language. Respondents should really know what they are talking about when they compare languages.

In order to avoid over-fragmentation of our data, we will limit the set of primary languages to a handful of mainstream languages the interviewer has at least modest knowledge about:

C
C++
Java
JavaScript
PHP
Python
Ruby
…
…

This set in particular includes: at least one statically typed and one dynamically typed language, at least one language that may be perceived as young, chic, progressive, buzzing and one with a presumably more old, dusty, legacy feel.

Ideally, the secondary language comes from the same set. If that is too difficult, we will accept other secondary languages as well. When we do so, the following interesting secondary languages are preferred:

CoffeeScript
Delphi
Kotlin
R
Rust
Swift
TypeScript
Visual Basic
…

These mostly represent more new, shiny languages and more old mainstream languages.

We do not consider HTML, XML, CSS, or similar things to be programming languages and would rather not consider Unix Shell (sh, bash, or other) a programming language.

3.3. Research method

We will use Grounded Theory Methodology (GTM) for this work. This means in particular:

Data analysis starts as soon as the first few (e.g. 2) interviews have been recorded.
Analysis attempts to conceptualize the responses and uncover interesting information hidden in them below the surface, in particular regarding how respondents view programming, and how they cope with it, and what role languages play in this. Dynamic aspects (behavior) are more interesting than static aspects, but both can be useful results.
GTM is driven by Theoretical Sensitivity, an intuition regarding what is interesting and relevant in the phenomena underlying the data. Some starting points for this study might be:
- How do respondents view programming? (goals, constraints, priorities, difficulties, …)
- How do respondents cope with programming? (preferences, tactics, …)
- What role do language preferences play for selecting a job?
- Are languages rather chosen for their strengths or avoided for their weaknesses? Which primarily?
- …
- … These are things that cannot be asked directly as an interview question, because the responses would be too inconcrete to be credible.
Interview transcription should be done on an "as needed" basis. Most material should initially only be paraphrased. The rate of compression applied when doing this should be adapted to the material's expected level of interest at that point, and then refined over time during analysis.
Theoretical Sampling: None of the considerations sketched above and in particular the set of interview questions shown below should be considered carved in stone. Everything is allowed to change over time as the analysis uncovers phenomena worth following up. (In the interest of feasibility, the decisions above should be considered more rigid than those below. The most variable aspect may be who to ask for interviews.)
…

4. Interview questions

4.1. Respondent background

Professional programming experience: length, application domains, roles served, education.
Language experience: languages most used now; languages known well today (including or not including the above); languages known well in the past.

4.2. …

4.3. …

Work plan and progress

(one entry per calendar week, possibly with subentries, containing the current plan (frozen once the week starts) and work/result report (for past weeks))