Impressum

Dagstuhl Seminar: "Empirical Theory and the Science of Software Engineering"

09 Aug 2023 - 17:03 | Version 3 | UnknownUser

This is a summary of Dagstuhl Seminar 04051, held January 26-28, 2004.

The presentations were accompanied by intensive discussion. The results of this discussion are often included in the description of a presentation (marked as such or unmarked).

Note: Most of the writeup will only be intelligible to those who have been there.

Dagstuhl Seminar: "Empirical Theory and the Science of Software Engineering"

Abbreviations

CS: Computer Science (or Informatics)
FSE: ACM SIGSOFT Symposium on the Foundations of Software Engineering
HCI: Human-computer interaction
OO: object-oriented
ROI: return on investment
SE: Software Engineering
SW: Software
TSE: IEEE Transactions on Software Engineering

Timeline

(Because it is interesting to see how much time we spent on what and what brought us where.)

Monday

8:45 Introduction (Jim Herbsleb):
'Bleeding' study (18th century), notion of "theory". Goals of workshop:
- Identify theories and references relevant to SE.
- Find a more mature view of theories in SE.
- Find ways to get this view adopted.
9:10 Participants introduce themselves
9:30 Jim Herbsleb explains the schedule (Walter Tichy: 1. But that is not binding. 2. We should focus on discussions and see where that leads us. 3. We should look forward, not back; better present ideas than finished work)

The limits of empiricism

9:40 (Walter Tichy:) The limits of empiricism
- On theories:
  - A theory explains and predicts phenomena
  - It can be used to formulate testable hypotheses
  - Theories (need to) evolve
  - Theories are not necessarily mathematical
- Software research consists of:
  - Technology
  - Theory
  - Experiment and simulation
- Theory is poorly developed
  - Many relationships are only qualitative and are poorly explained ((heated discussion on this one. Result: Some theory is there, but it is not deep and rarely stated explicitly.)) * "Modularity helps" * "Functional programming is a good thing" * "Inspections help detect faults" * "Faults found late are more expensive than errors found early"
- Examples of theory in SE
  - Design Pattern research
    - Explanation: short-term memory, chunking theory
  - Technical reviews research (see TSE 26(1):1-14)
    - Model uses member selection, task training, group size, resulting group expertise, group process, and social decision scheme in order to explain review performance
  - (two more examples left out by Walter)
10:40 Break

Subgroup faultlines in distributed international teams

11:05 (Pamela Hinds:) Subgroup faultlines in distributed international teams
- Situation: international distributed SW development (USA/India)
- Observed states: "us" vs. "them" attitudes, including stereotyping
- Faultlines exist if key attributes correlate with group membership rather than cut across (Lau and Murnighan, 1998). Medium degrees of heterogeneity are worst, because people align along the remaining similarities.
- Hypotheses: Subgroup salience is increased by activating events. Cross-national learning occurs if people cooperate; it leads to higher team effectiveness now and improved capability of individuals on future teams. Subgroup ethnocentrism leads to reduced team effectiveness.
- What are the influencing factors (moderators) that lead to either cross-national learning or subgroup ethnocentrism?: Inclusion, perceived interdependence, sharing of contextual information, threat.
- When we designed the study for evaluating our hypotheses, we found we need a mix of methods, including qualitative ones. Multinational team of researchers, 180 interviews, concurrent observations for 1 week at pairs of teams at different sites.
- Evaluation has only started. Important factors found yet: Co-location with (a) the architect, (b) your manager; _after_-effects of face-to-face interaction; having to use a non-native language; time zones (mattered because of the time of day a meeting would be)
- ((Lots of questions and discussion here))
12:25 Break

Define the scope of empirical theories

13:30 Start
13:40 (Dag Sjoberg:) How should we define the scope of empirical theories in SE?
- Goal of science: produce/find/collect general knowledge
- Theories can only be refuted, not proved
- A theory is a system: universe of discourse; several hypotheses; scope conditions; …
- Claim: Theories in SE should represent general knowledge about which technology is useful for what subjects to conduct which tasks in which contexts.
- Theories should be formed bottom-up (i.e. start with rather specific settings and questions)
- Taxonomies of subjects, tasks, or contexts are not available.
- Characterizing external validity (Campbell and Stanley, 1963) would require to talk in terms of such taxonomies. This is often not even tried today.
- Example: Series of experiments concerning centralized vs. decentralized control in OO programs, repeated with different kinds of subjects in different contexts. It found different results depending on subjects and context.

Toward a social psychology of software engineering

14:25 (Tom Finholt:) Toward a social psychology of software engineering
- Why a social psychology? Both development and use occur in social context
- What may be related: Psychology of programming; human-computer interaction; CSCW; empirical studies of SE
- Initial concepts: Grounding (creating and maintaining a common ground); tinkering (simulation vs. calculation)
- Grounding is challenged if people have different cultures (of whatever kind)
  - Example: Earthquake project.
- Tinkering: concrete vs. abstract; exploratory vs. rule-based; improvisational vs. planned.
- Thoughts on methods: Conceptualize (observe models); build (intervene); trials (deploy, use, evaluate); modify (extend, evolve). This is a very painful process.
- Prospects: SW costs will continue to dominte IT costs; risks will still get greater. Thus: we need new sources of feedback for our research; we need proxies for behavioral measures (example: social network visualization).
15:10 Break

Let's get metaphysical

16:00 (Steve Easterbrook:) Let's get metaphysical!
- (Audris Mockus:) Our problem is not lack of data, it is lack of a coherent way of interpreting the data.
- Views of science:
  - logical positivism (absolute knowledge),
  - Popper (falsification, not proof),
  - Campbell (observation is biased, there are too many possible theories),
  - Quine (cannot separate contigent meansings of terms from observations),
  - Kuhn (paradigms dominate for a while, then revolution),
  - Lakatos (paradigms compete, each has a core immune to refutation),
  - Feyerabend (discovery depends on historical context, any method that gives new insights is acceptable),
  - Toulmin (evolving Weltanschauung determines what is accepted as fact or as acceptable question),
  - Laudan (negative evidence is not too important in evaluating theories; new theories often do not explain some things the old one could).
- What is engineering?
  - Traditional: Engineers consume the knowledge that scientists have produced. They use it to change the world.
  - More realistic view: both groups create knowledge, are driven by problems, seek to understand and explain, rely on tacit knowledge, design to test theories.
- What is (software) design?
  - Pre-industrial design: artisans design as they create.
  - Industrial design: distinguish design and production. Requires describing the design (design document).
  - Software design: Design only.
  - ((Heavy discussion on this last one:))
    - Is there really no production step?
    - How about programming being characterized as a craft or art?
- Normal vs. radical design * Normal design solves problems whose solutions are well-known * Radical design solves problems that have never been done before * The "human activity" part of SE is always radical design

Studying effective work practices for Open Source Software development

16:55 (Kevin Crowston:) Theory for studying effective work practices for Open Source Software development
- Is this just small-group research or what is special about SE?
- Pragmatist perspective of science:
  - John Dewey: Goal is to solve real problems. We learn by interacting with the world.
  - Abraham Kaplan: Scientists have built techniques based on what works ("logic in use"), not how they describe in journal articles ("reconstructed logic").
- Process vs. variance theories
  - Most IS theories are of the type "the more of X you have, the more of Y you get"
  - Process theories describe how outcomes of interest develop through a sequence of events
- FLOSS = Free/Libre Open Source Software
- Research question: What makes some FLOSS teams more effective than others?
- Success measures: system quality, use, downloads, number of developers, developer satisfaction, …
- Based on Hackman's team effectiveness model (1986?): organizational context, group design, group synergy, process criteria of effectiveness, material resources LEAD TO group effectiveness
- We filtered 140 successful projects from about 50000 projects on sourceforge in April 2002 (criteria: more than 7 developers, more than 100 open bugs)
- (shows some measures (centrality) and statistics)
17:50 End

Tuesday

8:40 Discussion about how we proceed

An empirical theory of coordination in SE

8:45 (Jim Herbsleb:) An empirical theory of coordination in SE
- Focus on the dependencies among engineering decision
- Assumptions: Decisions are a good unit of progress; decision-making consumes resources; decisions constrain each other; the coordination problem means avoiding constraint violations; constrain violation produces defects
- ((Discussion on whether having many interdependencies is typical of SE only. Maybe, maybe not.))
- Note: 'interdependencies' means the same as 'constraints'
- Observing decisions as progress measure is more fine-grained compared to Earned Value.
- Decisions are distributed across both time and people.
- Coordination problems are pervasive. Many important advances address them: modularity, structured programming, high-level languages, SW architecture, OO design.
- Simplifications: Decisions are either consistent or inconsistent; functional requirements are either satisfied or not; interdependencies are always less troublesome when (a) fewer people are involved, (b) they communicate effectively, © the constraints are highly visible.
- Empirical findings: More people involved leads to increased cycle time (Herbsleb, Mockus, TSE 2003); more people involved leads to lower productivity (Herbsleb, Mockus, FSE 2003)
- ((Discussion of why there is no feedback in the model. Modeling feedback requires a much higher level of detail.))

Non-Linear Modelling in Software Engineering

9:40 (Frank Padberg:) Non-Linear Modelling in Software Engineering
- We found a non-linear dependency between two inspection measures and a statistic of interest: total defect content in a document (which is not directly observable).
- Procedure: Perform entropy-based feature ranking to select the input variables; compare Jackknife errors of various candidate models. We picked a neural network model in the end.
- There is a SW development project dynamics model by ???. It consists of four interacting submodels, each with multiple feedback loops. Its simulation re-created with good precision various actual project data based only on a number of input parameters determined from interviews.
- ((Another example described: Net discounted value computation of the break-even point for XP-style pair programming, given a pair speed advantage and pair defect advantage))
- A simple model classification: (1) observation and measurement (requires a basic model); (2) regression and prediction (statistical models); (3) explanation and integration (qualitative models); (4) optimization and tradeoff analysis (quantitative extensions of qualitative models).
10:25 Break

Focus on what goes wrong

11:00 (Lutz Prechelt:) Focus on what goes wrong

Modeling software changes

11:30 (Audris Mockus:) Modeling software changes
- Purpose: quantify software production in order to make informed decisions; explain phenomena represented in logs of workflow systems.
- Assumptions: SW development mostly consists of changes, broken down into work items.
- Plenty of data is collected about changes: marketing, hotline, architecture, defect database, schedules etc.
- Advantages of using such data: non-intrusive data collection; large quantity of data allows calibration and historical comparisons; fine grain; data is uniform over time
- Pitfalls: breakdown of work items is project-dependent; different tools are used; different process rules for using a tool;
- Trivia on an Avaya C project (Definity):
  - Of 16 M LOC added since 1993, only 3.7 M LOC are textually unique.
  - Most frequent line is blank (2 MLOC), second is open brace (1M)
  - 36796 changes to config.h, the most frequent include file
  - Line types: 2.3 M comments, 1.2 M calls
- Existing models: predicting the quality of a patch; determine which parts of the code can be independently maintained; who are the experts on certain code; measure organizational dependencies; what makes some changes hard;
12:20 Break

Pair Programming: observation and measurement

14:10 (Matthias Müller:) Pair Programming: observation and measurement
- Pair Programming speed factor PPSF = elapsed time for one developer / elapsed time for pair (for a given task)
- PPSF values published: 1.0 (students, Nawrocki 2001), 1.4 (professional programmers, Nosek 1998), 1.8 (student homework assignments, Williams 2000). The studies have many differences with respect to task, setup, time limits, etc.
- Müller 2003 forced programs to have certain quality or be reworked. Then PPSF = 1.0.
- "How much did you like the Pair Programming situation" (on a scale 5 "very well" to 1 "not at all") explained a lot of the variance in elapsed time to finish.
- So: What must a theory of Pair Programming be a theory of?: problem domain, task size, task complexity, individual skill, feelgood factor, individual character, context.
- ((feelgood factor resembles a mechanism, all others are either boundary conditions or moderators))

Can virtual laboratories support SE theory building?

15:00 (Dietmar Pfahl:) Can virtual laboratories support SE theory building?
- Categories of science:
  - Natural science: assume fixed and consistent laws to be uncovered.
  - Social science and psychology: facts are often qualitative, time-variant, difficult to interpret. Unclear how universal a rule is. However, there are many theories about the individual (cognitive science) and about groups (social psychology)
  - SE has many similarities to social science and psychology.
- Empirical SE currently creates mostly isolated pieces of evidence.
- Problem: Theory building is difficult if there is no agreed research agenda and few common tools.
- Suggestion: Use virtual laboratoties that collect all kinds of previous results in a process simulation model, which would then allow what-if games.
- Is this a sensible research approach? (discussion postponed until after the break and next presentation)
15:40 Discussion of how to proceed
15:45 Break

SW Process simulation: a potential platform for enriching empirical studies

16:20 (David Ruffo:) SW Process simulation: a potential platform for enriching empirical studies
- What is process simulation?: computerized executable model, used if behavior over time is important or if manipulating the real system is too expensive
- Past work:
  - Paradigms: knowledge-based, state-based, discrete event, system dynamics, hybrid, agent-based
  - Issues addressed: strategic management, process planning, project management and control, process improvement, technology adoption, process understanding, training
- PTAM/GQMM: Process tradeoff analysis model, goal/question/model?/??. Derive a process model specific to a particular question you want answered. We have built an infrastructure for making that easy. These models can also be used for process correction as you go in a project.
- Advantages: you can explore what-if questions, explore factors beyond the scope of the original study, hypothesize and test new theories.
((Discussion on benefits and limitations of simulation as an approach to working with theories. Result: It has merits if done right, but is no silver bullet.))
17:40 End

Wednesday

Theory and what it's good for

9:50 (Janice Singer:) Theory and what it's good for
- What HCI has to tell us: slightly older field, same growing pains, similar accumulation of empirical evidence, similar lack of theory(?). Consequence: Look at HCI.
- The usefulness of cognitive psychology for producing useful systems is controversial (e.g. con: Landauer, pro: Sutcliffe)
- Some old research of mine: We observed people maintaining a SW system and we developed the notion of Just-in-time Comprehension (JITC): They tried to understand only those parts of the system that (and when) they had to for the bugfix they were just trying to do.
  - The favorite tool was grep. * We found why: usable across a broad range of contexts and by a wide range of people with different (levels of) expertise. This is a theory. * We built our own tool based on this theory. It was successful and became used by these people.
  - The tools group of that company even took over development of the tool. * Conclusion: We were successful, but theory-building had only a small part in that.
- What theory-building cannot do for you: Convince practitioners (etc.)
- Is theory-building really worthwhile? Maybe we need to take return-on-invest into account.
- ((Discussion:))
- IS has struggled for long with attempts to reduce itself to psychology, management etc. That is probably not useful: Chemistry could be reduced to Physics, but shouldn't. That is in fact applied modularity: Chemistry means re-using modules of physics phenomena without having to understand their internals.
- Most CS researchers would actually be quite surprised if we were to claim that the most important theory for SE does not come from CS.
- If we can get a grip on the real important questions and problems (such as modularity, how it works, and how to get it), then we can make a good case for theory.
- One thing that may make SE special and require the use of knowledge from psychology, management etc. is the fact that we do not design SW, we design human activities (those of the SW users).
10:10 Break

Goodness criteria for empirical theory

10:50 (Susan Sim:) Goodness criteria for empirical theory
- I looked for such criteria for my Ph.D. thesis about a theory of benchmarking. Here are my findings.
- Criteria are closely bound with the paradigm of a discipline. They are rarely explicitly articulated.
- Most treatment in philisophy of science (or of knowledge). Much hair-splitting, little practical advice. Popper figures prominently.
- Some material in literature on social science research methodology.
- Best source was http://garnet.acns.fsu.edu/~whmoore/theoryeval.pdf. An informal essay written for students.
- Definition of 'theory': set of statements that provide a causal explanation of a set of phenomena that is logically complete (all assumptions and definitions explicitly stated), internally consistent, and falsifiable.
- There are three types of criteria: empirical, analytical, pragmatic.
  - Empirical: postdictive power, predictive power, testable (falsifiable and operationalizable), relevant.
  - Analytical: assess logical soundness, find rival theories, use a hierarchy of criteria to determine superiority.
  - Pragmatic: improves understanding, convinces your peers
- I studied the syllabi of all methodology courses on campus in various disciplines * Most are tied with teaching the most important theories. This is (a) the work leading there serves as an example and, more importantly, (b) because applying the methods requires starting from their results. * Two fields appeared most similar to SE: Management and Education. Those happen to be the two who have epistomological science theory in their courses, while the others touch this only shortly.
- Process model for benchmarking
  - Preconditions: minimum level of maturity of the field, a history of evaluation, an ethos of collaboration. * Process: participants progress in lock-step * Features: led by a few champions, supported by lab work, opportunities for community participation (feedback)
- The theory may also apply to other fields (outside SE).
12:00 Break

List of things missing

We compiled a list of things that we think do not currerntly exist, but that we would much like to have.

Mon 9:50: Definition of boundary conditions for theories
Mon 14:00: Taxonomies of types of software engineers, tasks, technologies, work contexts.
Mon 14:30: In what respects is SE different from other fields (such as psychology, other kinds of engineering etc.)
Tue 10:20: A matrix listing existing models sorted by model level and SE task/subdomain

List of things available

We compiled a list of things that somebody mentioned and we thought might be useful for SE theory or research.

Mon 9:00: Criteria for distinguishing types of work. McGrath: Groups, interaction and performance, Prentice Hall 1984. Kelly and McGrath: On time and method, Sage 1984.
Mon 10:00: Chunking theory (short-term memory)
Mon 10:40: Theory of social loafing (Annual Review of Psychology 1990 (Levine, Morland) and 1996 (Guzzo et al.))
Mon 10:50: Carliss Baldwin, Kim Clark: Design Rules - The Power of *Modularity*; MIT Press 2000
Mon 11:30: Jeffrey Sanchez Burks: Protestant relational ideology.
Mon 14:00: Campbell and Stanley, 1963 (External validty)
Mon ???: Ideas on useful types of approach to science. Donald Stokes: Pasteur's Quadrant.
Mon 15:05: Pajek public domain (but not open source) network visualizer
Mon 14:40: DeLone and McClain 1992, Seddon 1997, Hackman 1986?.
Mon ???: Collective ground: Roberts and Weick.
Mon 15:50: Susan Sim's graduate course on research methods, http://www.ics.uci.edu/~ses/teaching/ics280
Mon 17:15: Success measures
Tue 12:15: http://sourcechange.sourceforge.net an infrastructure for collecting and processing software change data.
Tue 14:40: Body of work on pair problem solving (ask Tom Finholt or Janice Singer)
Wed 9:55: John Carroll: HCI models, theories, frameworks: Towards a multidisciplinary science, 2003 (book)
Wed 10:00: Thomas Landauer: Let's get real: A position paper on the role of cognitive psychology in the design of humanly useful and useable systems, 1991. (Claims that available theory is producing only little value, even if applied well. Low ROI.)
Wed 10:00: Alistair Sutcliffe: On the effective use and reuse of HCI knowledge, 2000. (Opposite opinion to Landauer)
Wed 9:15: Dag Sjoberg has a list of 140 controlled experiments in SE (from 9 journals and 3 conferences during the last 10 years)
Wed 11:05: An essay on how to evaluate the quality of a theory: http://garnet.acns.fsu.edu/~whmoore/theoryeval.pdf.

Characteristics of a theory

Not claiming to be complete; just the aspects that popped up.

Refers to a universe of discourse
Provides a clear-cut terminology; describes how to operationalize the terms
Has explanatory and/or predictive power and concerns several interconnected hypotheses
Is parsimonious (as simple as possible; cf. Occam's Razor)

Questions to discuss

Beware: They are on very different levels of abstraction and specificity.

Are students appropriate as observational subjects?
Do we need typologies/taxonomies? What if they are not theory-driven?
Do we need specialized theories of SE at all?
Is SW really "all design and no construction"?
Is theory of short-term memory a useful basis for understanding coordination issues?
How do we transfer existing theories into SE?
How can we capture and articulate implicit theories?

Discussion: How can we capture and articulate implicit theories?

Jim Herbsleb: We once collected a large number of claims (regarding OO) and tried to write them up in a precise form.
Maybe collecting is not just collecting, but actually forming a theory, because you put the pieces into one common context and make them talk to both each other and existing theories.
We tend to require too much before we call anything a theory. There are plenty of would-be theories out there that have just not been spelled out. In many cases spelling them out (roughly) would not be difficult.
- Why don't people spell them out if it is easy?
- Because they would then be asked about boundary conditions.
- Many such vague theories are invented for marketing purposes. The proposed techniques may still work, but often for different reasons (example: eXtreme Programming).
There is a lot of material explaining how to make theories explicit: the Qualitative Research literature, Grounded Theory, etc.
And there are lots of theories: All of the empirical SE literature contains some, the "Mythical Man-Month" is full of them, etc.

Discussion: How do we transfer existing theories into SE?

Meaning "how can we source relevant theories from other disciplines and use them for purposes of SE research"?

Janice Singer: We had this really crowded ICSE workshop titled "Beg, borrow, and steal" to teach exactly that. It was quite good, but still nobody seemed to take up that kind of approach afterwards.
That is because it is the wrong approach to try and convince engineers to become social scientists. You should try and get social scientists interested in your questions.
That will not work either, because they won't get promoted if they do interdisciplinary work, because many departments just do not support this at all.
- There are joint degrees, however (Jim Herbsleb, Tom Finholt)
- And there is the consulting model, regularly used with statisticians in many fields.
- And there are journals that will happily publish interdisciplinary contributions.
To make it work we could (or will have to)
- tell students to take additional courses from other disciplines
- write books and articles advocating this
- ask for it as reviewers and editors

Discussion: Do we need specialized theories of SE at all? Is SW really "all design and no construction"?

There is much more to modularity than chunking theory. This is a sufficient example. Good SE modularity research may (for all the right reasons) not be publishable in, say, a group cooperation research journal.
You have to use SE terminology. We should build specialized libraries, say, of process simulation models etc. So the answer is yes.
Another answer would be: I do not care. If something from elsewhere perfectly fits as it is, just use it. If not, adapt it.
One danger is re-inventing the wheel. We need to know about existing theory.
- But we are re-inventing the wheel even within SE research.
- That may be because we have so little theory, which would provide a taxonomy that would allow to more easily find existing work.
We may even build theory for other fields, like AI research in the 1960s did when building executable models of the mind.

Quotes

(Totally taken out of context)

Janice Singer: Expertise is just stupid. *

Project: The Book

We will try to produce a book out of the workshop that is not just the presentations pasted together

'Proposed' title: Theoretical Software Engineering
Proposed structure
- Theory construction and development issues
  - The role of theory in SE (Jim, Walter)
  - Quality criteria for theories (Susan)
  - Validation, or: What is a theory? (Steve, Lutz)
  - Theory in other fields (Janice)
  - Simulation models and theory (David, Dietmar)
- Examples of theories
  - Audris, Frank/Matthias, Jim, Kevin, Marshall(?), Pamela
- Examples of types of theory not represented here
- FAQs with answers
- Annotated bibliography
  - Fundamental
  - Methodological
  - Theories from other disciplines
  - Examples from SE
  - Would-be examples

Procedure for getting there
- Everybody sends an abstract (e.g. one paragraph or a bullet list)
  - until Feb 6, 2004
- Walter and Jim refine the concept
- Potential publishers
  - MIT Press
  - Pearson