Chisel 
logo  Computer Human Interaction & Software 
Engineering Lab
Syndicate content
Software development research that is relevant in practice
Updated: 14 hours 32 min ago

Do Faster Releases Improve Software Quality?

15 hours 18 min ago

Foutse Khomh, Tejinder Dhaliwal, Ying Zou, and Bram Adams: Do Faster Releases Improve Software Quality? An Empirical Case Study of Mozilla Firefox. MSR 2012

Nowadays, many software companies are shifting from the traditional 18-month release cycle to shorter release cycles. For example, Google Chrome and Mozilla Firefox release new versions every 6 weeks. These shorter release cycles reduce the users’ waiting time for a new release and offer better marketing opportunities to companies, but it is unclear if the quality of the software product improves as well, since shorter release cycles result in shorter testing periods. In this paper, we empirically study the development process of Mozilla Firefox in 2010 and 2011, a period during which the project transitioned to a shorter release cycle. We compare crash rates, median uptime, and the proportion of post-release bugs of the versions that had a shorter release cycle with those having a traditional release cycle, to assess the relation between release cycle length and the software quality observed by the end user. We found that (1) with shorter release cycles, users do not experience significantly more post-release bugs and (2) bugs are fixed faster, yet (3) users experience these bugs earlier during software execution (the program crashes earlier).

Designing and building software is not like assembly-line manufacturing, but some aspects of it can be studied and improved like other industrial processes. In this upcoming paper, Khomh et al. examine the effects of Mozilla’s switch from a yearly (or longer) release cycle to a much more frequent cycle. Their raw material is bug and crash data; their conclusions are:

  1. Users experience crashes earlier during the execution of versions developed following a rapid release model.
  2. The Firefox rapid release model fixes bugs faster than using the traditional model, but fixes proportionally less bugs.
  3. With a rapid release model, users adopt new versions faster compared to the traditional release model.

#3 is good news; #2 is (mostly) good, but #1 is a puzzle for which the authors don’t have an explanation—at least, not yet. We’d welcome suggestions about why it might be.

Categories: News

The Architecture of Open Source Applications: Volume 2

Tue, 2012-05-08 10:54

We are very pleased to announce that The Architecture of Open Source Applications: Volume 2 is now available from Lulu. A PDF version will go on sale in the next few days, and e-book will become available as soon as we can produce it. Many thanks to everyone who contributed, and to the indefatigable Amy Brown for pulling it all together. As always, all royalties will go directly to Amnesty International, so if you buy a copy, you’ll be helping to make the world a better place.

Categories: News

Cohesive and Isolated Development with Branches

Sun, 2012-05-06 03:43

Earl T. Barr, Christian Bird, Peter C. Rigby, Abram Hindle, Daniel M. German, and Premkumar Devanbu: Cohesive and Isolated Development with Branches. FASE 2012.

The adoption of distributed version control (DVC), such as Git and Mercurial, in open-source software (OSS) projects has been explosive. Why is this and how are projects using DVC? This new generation of version control supports two important new features: distributed repositories, and history-preserving branching and merging where branching is easier, faster, and more accurately recorded. We observe that the vast majority of projects using DVC continue to use a centralized model of code sharing, while using branching much more extensively than when using CVC. In this study, we examine how branches are used by over sixty projects adopting DVC in an effort to understand and evaluate how branches are used and what benefits they provide. Through interviews with lead developers in OSS projects and a quantitative analysis of mined data from development histories, we find that projects that have made the transition are using observable branches more heavily to enable natural collaborative processes: history-preserving branching allow developers to collaborate on tasks in highly cohesive branches, while enjoying reduced interference from developers working on other tasks, even if those tasks are strongly coupled to theirs.

As the authors observe, distributed version control has become a must-have for fashionable open source developers in the past four years. But is it actually more productive than traditional (centralized) version control? Is it actually even different, once the hype is cleared away? According to this new study by Barr et al, the answer is a qualified “yes”: while most projects (and developers) still use a hub-and-spoke model in practice, they make heavy use of DVCSes’ support for lightweight branching and merging. Knowing this may help designers of the next generation of version control systems find better ways to support what people actually do.

Categories: News

A Review of “Code Simplicity”

Thu, 2012-05-03 03:16

Max Kanat-Alexander: Code Simplicity: The Science of Software Development (Kindle edition). O’Reilly, 2012, B007NZU848.

The goal of this ambitious new book from O’Reilly, stated in its preface, is to “[lay] out scientific laws for software development, in a simple form that anybody can read.” What it actually does, however, is demonstrate that its author doesn’t really know what science is, or what science has already told us about his chosen subject.

Let’s start with the first point. Kanat-Alexander’s definition of a science is the traditional one: it is composed of facts that have been collected and organized, and contains general truths or basic laws that have been validated experimentally. Where he comes up short is in applying the last part of that definition. There are plenty of sweeping claims, many of which I actually agree with, but where’s the data? Where are the experiments (or at least studies) showing how that data backs up his claims, and that those claims aren’t actually refuted by any data? The only hard evidence on offer in the whole book is a table in chapter 5 showing how five files changed over time. The files aren’t identified; neither are the projects they came from, and we’re not told the timescale of the changes (was it days or years?).

The second failing of this book is that it completely ignores what we actually do know about programs and programmers. 20% of the way through the book [1], as he’s trying to explain how we got into our present mess, he writes:

Then along came The Mythical Man Month, a book by Fred Brooks, who actually looked at the process of software development in a real project and pointed our some facts about it… He didn’t come up wiht a whole science, but he did make some good observations… After that came a flurry of software development methods: the Rational Unified Process, the Capability Maturity Model, Agile Software Development, and many others. And that, basically, brings us up to where we are today: lots of methods, but no real science.

Well, no. Where we actually are today is in the middle of an explosion in real scientific understanding of how programmers work, how software evolves, how likely it is to contain bugs, and dozens of related topics. If this year matches 2011, something like 200 new peer-reviewed studies will be published, some by academics, and some by researchers at IBM, Microsoft Research, and other industrial labs. That’s a lot for a working programmer to read, which is why we put together Making Software (ironically, also published by O’Reilly) to summarize what we actually know and why we believe it’s true.

So what are Kanat-Alexander’s “laws”, and how substantial are they?

1. The purpose of software is to help people.
If I said, “The purpose of cars is to move people around,” would you consider that a “law”?
2. The desirability of a change is directly proportional to the value now plus the future value, and inversely proportional to the effort of implementation plus the effort of maintenance.
Replace the word “desirability” with “value”, and this is simply the definition of net present value.
3. The longer your program exists, the more likely it is that any piece of it will have to change.
Really? How does that claim stand up against the data that Elaine Weyuker and Tom Ostrand analyzed at AT&T, or the work Nachi Nagappan, Tom Ball, and their colleagues have done at Microsoft Research?
4. The chance of introducing a defect into your program is proportional to the size of the changes you make to it.
The people listed above, plus others like Dewayne Perry, have found that small changes are proportionally more likely to introduce faults than large ones. If they’re wrong, can Kanat-Alexander show where they made their mistake?
5. The ease of maintenance of any piece of software is proportional to the simplicity of its individual pieces.
If Kanat-Alexander knows how to measure the simplicity of a piece of software, he deserves the Turing Award: as El Emam shows and colleagues showed in 2001 (see Herraiz and Hassan’s chapter in Making Software for a summary), we still don’t have a complexity measure that performs any better than counting lines of code. If he doesn’t know how to measure simplicity, how can he say that anything else is proportional to it? And either way, I strongly suspect that maintenance costs are influenced more by the complexity of the couplings between the components, rather than by their individual simplicity.
6. The degree to which you know how your software behaves is the degree to which you have accurately tested it.
I think this is saying that the degree to which we can predict how a program will behave is correlated with the amount of testing we’ve done. That’s a plausible claim (assuming we agree on ways to measure “predictability” and “amount of testing”), but where’s the data?

The author of this book is clearly intelligent and passionate about his craft. He has undoubtedly written and shipped more good software in the last ten years than I ever will. If he doesn’t know what we’ve discovered about software engineering in the last forty years, that’s a clear sign that we’re not doing our jobs properly. It isn’t enough to be right: if we want our work to matter, we must communicate it to others, and this book shows that we have clearly failed to do that.

[1] How do you specify a location in an e-book that doesn’t have page numbers?

Categories: News

Example Embedding

Wed, 2012-05-02 20:00

Ohad Barzilay. “Example Embedding”Onward! 2011. 

Using code examples in professional software development is like teenage sex. Those who say they do it all the time are probably lying. Although it is natural, those who do it feel guilty. Finally, once they start doing it, they are often not too concerned with safety, they discover that it is going to take a while to get really good at it, and they realize they will have to come up with a bunch of new ways of doing it before they really figure it all out.

After we had an interesting discussion with input from industry a few weeks ago with the paper “Component reuse vs. snippet remixing”, I want to follow-up in that direction with another perspective. Apart from the hilarious abstract, it is a paper that reads well and entertains. In addition, it sheds some light onto a common, yet shunned practice.

From time to time, developers use example code from the Web in their own code. In his essay, Barzilay lays the foundation for talking about this phenomenon and identifies the elements of that ecosystem. For example, he mentions what kinds of sites are used by developers and what the process of reusing an example can look like. Addressing concerns from the Component reuse vs. snippet remixing paper, he discusses potential solutions for making the practice safer and more systematic. Finally, as an interesting analogy, Barzilay contrasts example embedding with academic practices.

While the use of examples from the Web may be looked down upon by several developers, it is a pervasive practice. Yet, neither industry nor research provide comprehensive support for making it safer. Barzilay’s essay helps us understand what the example ecosystem looks like and gives us ideas for possible solutions of its problems.

Finally: thanks to Jorge, Neil, and Greg for having me here at NWIT! 

Categories: News

On the naturalness of software

Thu, 2012-04-26 11:00

Abram Hindle, Earl Barr, Zhendong Su, Prem Devanbu, and Mark Gabel. “On the Naturalness of Software”, ICSE 2012.

Natural languages like English are rich, complex, and powerful. The highly creative and graceful use of languages like English and Tamil, by masters like Shakespeare and Avvaiyar, can certainly delight and inspire. But in practice, given cognitive constraints and the exigencies of daily life, most human utterances are far simpler and much more repetitive and predictable. In fact, these utterances can be very usefully modeled using modern statistical methods. This fact has led to the phenomenal success of statistical approaches to speech recognition, natural language translation, question-answering, and text mining and comprehension.

We begin with the conjecture that most software is also natural, in the sense that it is created by humans at work, with all the attendant constraints and limitations—and thus, like natural language, it is also likely to be repetitive and predictable. We then proceed to ask whether a) code can be usefully modeled by statistical language models and b) such models can be leveraged to support software engineers. Using the widely adopted n-gram model, we provide empirical evidence supportive of a positive answer to both these questions. We show that code is also very repetitive, and in fact even more so than natural languages. As an example use of the model, we have developed a simple code completion engine for Java that, despite its simplicity, already improves Eclipse’s completion capability. We conclude the paper by laying out a vision for future research in this area.

This paper is not directly applicable to software practice, but you may still find it pretty cool and a great read. It uses the statistical approach to Natural Language Processing that is used to such good effect by tools such as Google Translate, but applied to lines of code. The authors find that code is much more amenable to statistical modelling than English. This means that more powerful code completion and code suggestion tools are viable (they prototyped one for Eclipse), and it also opens the door to new approaches in software mining research. Exciting stuff…

Categories: News

Ensemble effort estimation

Tue, 2012-04-17 08:00

Ekrem Kocaguneli, Tim Menzies, and Jacky Keung. “On the value of ensemble effort estimation”, TSE 2011.

Background: Despite decades of research, there is no consensus on which software effort estimation methods produce the most accurate models.

Aim: Prior work has reported that, given M estimation methods, no single method consistently outperforms all others. Perhaps rather than recommending one estimation method as best, it is wiser to generate estimates from ensembles of multiple estimation methods.

Method: 9 learners were combined with 10 pre-processing options to generate 9 × 10 = 90 solo-methods. These were applied to 20 data sets and evaluated using 7 error measures. This identified the best n (in our case n = 13) solo-methods that showed stable performance across multiple datasets and error measures. The top 2, 4, 8 and 13 solo-methods were then combined to generate 12 multi-methods, which were then compared to the solo-methods.

Results: (i) The top 10 (out of 12) multi-methods significantly out-performed all 90 solo-methods. (ii) The error rates of the multi-methods were significantly less than the solo-methods. (iii) The ranking of the best multi-method was remarkably stable.

Conclusion: While there is no best single effort estimation method, there exist best combinations of such effort estimation methods.

Anybody who has ever done software effort estimation knows that it’s a pretty hard thing to do. It’s tough even for small individual tasks for someone without practice, and it’s a horribly difficult task for large-scale group projects even for estimators with lots of practice. There are many methods that estimators could use, but as Kocaguneli & Co remind us, “no single method consistently outperforms all others”—sometimes you’re better off using method A, other times, method B would’ve been more appropriate. Their proposal: to build ensembles of methods, each of them deficient on their own, and to plug them with different automated learners in the hope that these new multi-methods will provide estimates with less error and more consistency.

The multi-methods approach worked well in their (pretty large) dataset of nearly 1,200 projects. This does not mean that the method that came out on top for them will come out on top for you, too. It only means that ensembles of methods are a good workaround for the problem of inconsistency of method efficacy. What the authors propose is for practitioners to learn the basics of machine learning and build method ensembles themselves:

Therefore, our recommendations to practitioners, who are willing to use multi-methods but lack the knowledge of machine learning algorithms are:

  • Start with initial 2 learners and build the associated multi-methods
  • See the performance of the current multi-methods
  • Build new multi-methods only if you are not pleased with the performance of the current ones

That won’t be an easy task, but it may be less painful than committing to using a single method that often won’t work. If you’re interested in doing it, this paper has several references and pointers to get you started.

 

Categories: News

Component reuse vs. snippet remixing

Tue, 2012-04-10 08:00

Susan Elliott Sim, Rosalva Gallardo-Valencia, Kavita Philip, Medha Umarji, Megha Agarwala, Cristina V. Lopes, and Sukanya Ratanotayanon. Software Reuse through Methodical Component Reuse and Amethodical Snippet Remixing“, CSCW 2012.

Every method for developing software is a prescriptive model. Applying a deconstructionist analysis to methods reveals that there are two texts, or sets of assumptions and ideals: a set that is privileged by the method and a second set that is left out, or marginalized by the method. We apply this analytical lens to software reuse, a technique in software development that seeks to expedite one’s own project by using programming artifacts created by others. By analyzing the methods prescribed by Component-Based Software Engineering (CBSE), we arrive at two texts: Methodical CBSE and Amethodical Remixing. Empirical data from four studies on code search on the web draws attention to four key points of tension: status of component boundaries; provenance of source code; planning and process; and evaluation criteria for candidate code. We conclude the paper with a discussion of the implications of this work for the limits of methods, structure of organizations that reuse software, and the design of search engines for source code.

One of the ways in which the Internet transformed software development is the prevalence of the “programming by Google” practice: searching online for a function or snippet that does what you want, and copy-and-pasting it into one’s own code, stitching it as needed to make it work. This practice is great in some ways (it speeds up development, it helps cross-pollinate ideas and techniques), but it also has its problems (maintaining code provenance, intellectual property, and diverging from policy, for example).

In this paper, Sim & Co provide a very good exploration of the distinction between the safer, more planned, and stuffier “component reuse” approach and the ad-hoc, versatile, under-the-table “snippet remixing” approach to code reuse. They have some interesting statistics on the use of both approaches (for instance: 92% of surveyed developers admit to remixing snippets of code), and they identify points of tension between them. Their paper should be a wake-up call for Software Engineering professors to stop acting as if component-based reuse was all there is to code reuse, and an invitation to practitioners to consider the strengths and weaknesses of both approaches and to define the right balance between them in their own contexts.

(Disclosure: Susan Sim is my academic sister—we had the same graduate advisor, though we did not overlap. Also, here is one more reminder that we link to PDFs of the papers we discuss when we find them. In cases we do not, asking the authors nicely for a copy usually works.)

Update, April 30: There is now a freely available electronic copy here.

Categories: News

Social coding in GitHub

Thu, 2012-03-01 09:00

Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. “Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository” CSCW 2012.

Social applications on the web let users track and follow the activities of a large number of others regardless of location or affiliation. There is a potential for this transparency to radically improve collaboration and learning in complex knowledge-based activities. Based on a series of in-depth interviews with central and peripheral GitHub users, we examined the value of transparency for large-scale distributed collaborations and communities of practice. We find that people make a surprisingly rich set of social inferences from the networked activity information in GitHub, such as inferring someone else’s technical goals and vision when they edit code, or guessing which of several similar projects has the best chance of thriving in the long term. Users combine these inferences into effective strategies for coordinating work, advancing technical skills and managing their reputation.

Platforms like GitHub provide an interesting twist to social dynamics in open source: they make it easy for everyone to keep track of, interact, collaborate, and be aware of the work of other developers, including some of the best in the world, all in one place. This paper by Dabbish & Co reports on work habits and perceptions of “central and peripheral” GitHub users. If you’re new to GitHub, this paper is a good take on its social aspect, but if you’re already used to working in GitHub, there will be little that will surprise you here. Still, I found some cool nuggets that might interest you. For instance, that once people amass an audience looking at their code production, they become more careful about what they make available publicly. Also, that some people get followers not because of programming ability or personal connections, but because they have “good taste” in the projects they themselves follow.

Categories: News

Looking at the same thing in pair programming tasks

Thu, 2012-02-23 12:01

Patrick Jermann and Marc-Antoine Nüssli. “Effects of Sharing Text Selections on Gaze Cross-recurrence and Interaction Quality in a Pair Programming Task” CSCW 2012.

We present a dual eye-tracking study that demonstrates the effect of sharing selection among collaborators in a remote pair-programming scenario. Forty pairs of engineering students completed several program understanding tasks while their gaze was synchronously recorded. The coupling of the programmers’ focus of attention was measured by a cross- recurrence analysis of gaze that captures how much programmers look at the same sequence of spots within a short time span. A high level of gaze cross-recurrence is typical for pairs who actively engage in grounding efforts to build and maintain shared understanding. As part of their grounding efforts, programmers may use text selection to perform collaborative references. Broadcast selections serve as indexing sites for the selector as they attract non-selector’s gaze shortly after they become visible. Gaze cross-recurrence is highest when selectors accompany their selections with speech to produce a multimodal reference.

The fact that pair programming can work pretty well doesn’t mean that it “just works.” Instead, it requires its own set of skills and considerations, and perhaps some people are better suited for it than others. In a controlled experiment using an eye tracker, Jermann and Nüssli show the effect of some of the very low-level actions that people in pairs may take to improve their performance. Specifically, two seemingly simple kinds of actions (talking aloud and selecting the block of text that you’re talking about) bring your partner’s attention to the same screen area. When pairs do this, their level of code comprehension increases.

Jermann and Nüssli’s study had engineers sitting separately, in front of different but shared screens. My guess is that if you and your pair are sitting side by side, other actions with the same purpose (such as pointing with your finger, or with your mouse) should have similar effects.

(As usual, we post links to the actual papers when we find them. I couldn’t in this case, but remember that researchers are usually happy to share their work over email if you ask nicely…)

 

Categories: News

Teachers Matter – Do Programmers?

Thu, 2012-02-16 08:12

Raj Chetty and John Friedman (Harvard), and Jonah Rockoff (Columbia) recently published a study showing how much long-term impact teachers have on students. To make a long story short, the answer is “a lot”, and that impact persists long after the child leaves the classroom. As far as I know, no-one has ever done something similar for programmers, i.e., looked at the long-term impact a particular software developer has on a project (for either good or ill). I think the hardest part would be developing a measure of one person’s impact on software; like all metrics, what you’d get out could all too easily be determined primarily by what assumptions you baked in.

But this example raises a broader question that we’d like to throw out to the whole community. What studies have you seen in other areas that you’d like to see replicated in software development? For example, Evan Robinson’s classic article “Why Crunch Mode Doesn’t Work” does a great job of summarizing research on the effects of sleep deprivation on productivity. None of those studies specifically looked at programmers; I suspect that one that did would be read and cited a lot. What other analyses would you like empirical software engineering researchers to transfer to our domain?

Categories: News