Replication of "PatMain"

09 Aug 2023 - 17:03 | Version 14 | UnknownUser

This page describes how to participate in the joint replication of a previous controlled experiment that is performed in the context of the RESER 2011 workshop.

Replication of "PatMain"

What is a joint replication?

The term refers to a replication of a controlled experiment that is not performed by a single researcher or a single closely-knit research group, but rather by a group of researchers that work together loosely:

There is a common definition of the experiment design to be replicated,
but each group gathers subjects and collects data individually
and each group initially performs local data cleansing and evaluation of their own data only, before
the data collected by each group are then joined into a larger data set
which is then analyzed and interpreted together

Why a joint replication?

It has (to the best of our knowledge) never been done before
so there is probably a lot to be learned with respect to methodology.
In particular, it will be interesting to see how homogeneous the subjects and the results will be across groups.
Also, gathering enough subjects for a stand-alone replication is difficult for a single group, as subjects are a scarce resource.

The original PatMain experiment

PatMain was the second controlled experiment (after PatDoc) ever performed regarding software design patterns. It was performed in 1997 and published in 2001:

Lutz Prechelt, Barbara Unger, Walter F. Tichy, Peter Brössler, Lawrence G. Votta. A Controlled Experiment in Maintenance Comparing Design Patterns to Simpler Solutions. IEEE Transactions on Software Engineering 27(12):1134-1144, December 2001.

(The first link is to a draft, the second to IEEE CS)

Its research question was this:

If, for a given problem, using a design pattern is a bit of an overkill, because it provides more functionality or flexibility than required, will the resulting solution be easier to maintain if a simplified solution is implemented instead of a solution based on the design pattern?

The experiment design involved four different rather small C++ programs:

CO: Communication Channels (involving Decorator)
GR: Graphics Library (involving Composite and Abstract Factory)
ST: Stock Ticker (involving the Observer pattern)
BO: Boolean Formulas (involving Composite and Visitor)

Each of these programs existed in two versions: One, called PAT, built using the design pattern(s), the other, called ALT (alternative), using a simplified design (without pattern).

The experiment was performed with professionals working on paper. There were two tasks (one understanding-only, one modification task) for each program. The experiment conduct involved

a pre-test, where each subject would work on one PAT version and one ALT version of two of the four programs,
then a short course on design patterns to increase the subjects' knowledge of design patterns,
then a post-test, where each subject would work on one PAT version and one ALT version of the remaining two of the four programs.

So there were three experiment variables:

Program (with levels OO, GR, ST, BO)
Design (with levels PAT, ALT)
Design Pattern Knowledge (with levels PRE, POST)

The experiment measured primarily two output variables:

time: How long did the subject take for each task?
correctness: How complete and correct is the solution?

Why replicate this experiment rather than some other?

The experiment has already been replicated once:

Marek Vokác, Walter Tichy, Dag I.K. SjØberg, Erik Arisholm, Magne Aldrin: A Controlled Experiment Comparing the Maintainability of Programs Designed with and without Design Patterns - A Replication in a Real Programming Environment, Empirical Software Engineering 9(3):149-195, 2004

The main point of that replication was that the subjects (also professionals) really worked on a computer rather than just sketching their solution on paper.

We picked this experiment for further replication because it has a good mix of properties:

a question that is still interesting (not fully solved)
small-enough scale (required subject work time)
measurement is simple
the subject's relevant skills have presumably changed since 1997

Changes compared to the original experiment

Rather than pre-test, then course, then post-test, each subject will work on only two of the programs.
Only two of the programs will be used overall: CO and GR. We exclude ST because it is relatively uninteresting. We exclude BO because the Visitor pattern is overly difficult. (An option to use these two or even all four is available, though.)
The programs have been converted from C++ to Java and C# (all three variants are available in the portal for each subject to choose from).
Subjects will implement solutions on a computer (like in the first replication but unlike the original experiment).
The questionnaire (with somewhat different questions) and time-taking is implemented as a web portal.

Who can participate?

Subjects should have a good working knowledge of C++, C#, or Java
and must be able to unzip an archive file, load the resulting set of files in a development environment, modify/compile/run the program, and zip the modified set of files again.
Design pattern knowledge is not strictly required. The questionnaire asks about such knowledge so that subjects of different knowledge levels can in principle be separated during the analysis.
Some design pattern knowledge would be helpful, though.

How to use the replication portal

Preparation by the experimenter

Register at the web application http://replication.inf.fu-berlin.de/replication/explogin
Create an instance of the experiment
Create subject IDs. (ID modulo 4 is the subject's group number)
Print the subject IDs and cut the pages into little chits with one ID each.
Distribute the IDs to your subjects. Make sure you use up each block of four as much as possible before you start handing out IDs from another block of four or else you might get groups of unbalanced size.
(You can make test runs and then delete the whole experiment afterwards.)

A note on robustness

Note that we made the following design decision: "We assume that experimenters are super-intelligent, quasi super-human types and thus never make mistakes".

Therefore, the experimenter part of the portal has no error handling worth speaking of. So please tread carefully. (The subject part is more robust, though.)

Execution

Each subject logs in with his/her ID (which implies the group number and guides the web app accordingly) and performs the two tasks as guided by the web app. The order is:
- for ID%4 == 0: GRpat, COalt
- for ID%4 == 1: COalt, GRpat
- for ID%4 == 2: GRalt, COpat
- for ID%4 == 3: COpat, GRalt where pat stands for the pattern version and alt stands for the alternative (non-pattern) version
This involves downloading the experiment program (twice) and uploading the modified experiment program (twice).
The web app will guide through this process, measure the times, and record answers to the pre-experiment and post-experiment questionnaires.
We expect that typical subjects ought require about 2-3 hours total.
Not all subjects need to work at the same time. (Beware of information leaks, though.)

Evaluation

The experimenter can download the result data (questionnaire answers, time measurements) from the portal as a TSV text file.
The experimenter can download the set of ZIP files from the portal.

Result TSV data file column descriptions

The the column descriptions were kept as self explanatory as possible. Find below the column descriptors and a few words to their encoding.

User ID: the ID the subject used to log in

Answers to the first prequestionnaire:
Programming languages used: comma-separated list, String
Programming languages used often: comma-separated list, STring
Lines of code written: int
Lines of code written in JAVA: int
Hours per week programming: int
self proclaimed programming skills(represented by an int value)
* top 10%: 1
* top 25%: 2
* top 40%: 3
* about average: 4
* bottom 40%: 5
* bottom 25%: 6
* bottom 10%: 7
Student status(represented by an int value):
* undergraduate student: 1
* graduate student: 2
* postgraduate student: 3
* not a student: 4
year of professional xp: number of years subject has been working as a developer, int
working hours per week: int
Major: String

Answers to the second prequestionnaire:
Number of patterns used: int
the knowledge of each design pattern is represented by an integer value:
* never heard of it: 1
* have only heard of it: 2
* understand it roughly: 3
* understand it well: 4
* understand it well and have worked with it once: 5
* understand it well and have worked with it two or three times: 6
* understand it well and have worked with it many times: 7
each pattern has its own column and the column's name is the design pattern name.

Answers to the postquestionnaire:
<program> patterns noticed: patterns noticed in this particular program, String
<program> difficulty (represented by an int value)
* quite easy: 1
* reasonably easy: 2
* neiter easy nor difficult: 3
* reasonably difficult: 4
* quite difficult: 5
<program> correctnes: the estimation of the subject on the correctness of his solution in %, int
<program> difficult aspects: the aspects which the subject found most difficult, String
<program> help pattern knowledge: "my pattern knowledge has let me solve the tasks … percent faster", int
<program> help documentation: "explicit documentation of patterns has let me solve the task .. percent faster", int

time measurements:
time is kept on the following pages. the variables are named "time <program>"
Questionnaire 1
Questionnaire 2
<program> download
<program> task
<program> upload
<program> task 2
The measurements are in seconds

other:
<program> task 2: the answers to the second task for each program, String
for the program communication, there is also a "task 3"
last comments: any last comments by the subject

Results handling at the RESER workshop

(Will be discussed on the 'replication' mailing list.)

In short, the current status is that every team of replicators will produce their own short writeup about their particular partial replication and attempt a separate statistical evaluation (which will likely be inconclusive).

We will devise a format for presenting these partial results at the workshop and the organizers will hopefully have collected all participants' data beforehands and performed an overall evaluation and/or handed out this overall data set to all participants.

Suggested structure for RESER submission

Keep it short
Assume that readers know the contents of this web page and have seen the replication portal and write a rather short introduction only. (We will see in what form we will provide an actual description of these parts in article form for the readers of the workshop proceedings)
Suggested structure:
- Subjects: number, origin, demographics, design patterns knowledge. This should include data obtained via the portal plus background information. Use the original articles for guidance of what to report. Update to modern times and your context as needed.
- Experiment conduct: When, where, how? Subjects voluntary/paid/obliged?
- Issues with some data points: Special events/problems etc. Use the subject ID to be specific.
- Descriptive data analysis (of results, not demographics), plots of all data points, table of all data points.
- Discussion of statistical evaluation method: Which? Why?
- Statistical evaluation
- Comparison to original articles, trends, and conclusion
- Lessons learned about (joint) replications
Some universities' raw data sharing rules can produce problems unless we do all include our raw data (including outliers) in the article, so please do this.
Make sure both the sanitized (if any) and original data values are present.

Participating groups: Who? When?

Who will (probably) be performing a piece of the joint replication? When? How many subjects are expected to take part?

(all people here are (or should be) members of the 'replication' mailing list)

University of Auckland
- Muhammad Sulayman
- Data collection: ?
- #subjects: ?
Università di Bari:
- Teresa Baldassarre
- Marcela Genero (visiting from Universidad de Castilla-La Mancha)
- Data collection: (unclear if any)
- #subjects: (unclear)
Freie Universität Berlin:
- Lutz Prechelt
- experimenter: Martin Liesenberg
- Data collection: October
- #subjects: about 10
Universidad Politécnica de Madrid / Universidad ORT, Montevideo:
- Natalia Juristo
- experimenter(?): Martin Solari
- Data collection (at ORT): session with subjects in mid-November.
- #subjects: 2 groups:
  - Masters students (8 in total) doing the exercise simultaneously in a lab.
  - Undergraduate students (10 to 30) doing the exercise in a distributed environment, but in the same week. Participation for the students is not mandatory in both cases, and the exercise is not graded. The undergraduates are are in the 3rd year of Systems Engineering, just after the Application Design course (which includes GoF patterns).
Brigham Young University, Provo
- Jonathan Krein
- experimenters: Jonathan Krein, Landon Pratt, Alan Swenson, Alexander MacLean, Charles Knutson
- Data Collection: late November.
  The experiment will be conducted asynchronously, but all subjects will be required to participate within the same week.
- #Subjects: 0-43
- Subject Population: Senior undergraduate software engineering course. The course is optional for students—one of many computer science courses that can be taken for senior degree credit. The course requires a prerequisite junior-level course that covers design patterns, so all class members have been trained recently (within a couple years) on design patterns. All students are required to take the junior-level course in order to graduate.
- Execution: Participation is voluntary, but incentivized with course credit. The course already includes required hours that the students must spend each week engaged in software engineering activities. The hours are not graded, only checked for completion. Students will be given the opportunity to participate in this research study as part of their required hours (i.e., the exercise will not be graded). Though students can withdraw from the experiment at any time, incomplete participation will receive no hourly credit (except in extreme cases).
Microsoft Corporation, Redmond
- Christian Bird
- Data collection: ?
- #subjects: ?
University of Stuttgart, Germany:
- Stefan Wagner
University of Alabama, Tuscaloosa:
- Jeffrey Carver
- experimenter: Aziz Nanthaamornphong
- Aziz writes 2010-10-14: "We extend this replication experiment by evaluating the understandability of software design (with/without design patterns). The UML diagrams which are generated from 4 existing programs (CO, BL, GR, ST) are used as the software design. Participants will first answer the questions related to the design diagrams. Thus, our participants will have seen the design before performing the replication experiment through the web site."
- Data collection: October
- #subjects: 18 graduate students

Hey, we are a fairly international crowd!