Code Smells Survival Analysis in Web Apps

. Web applications are heterogeneous, both in their target platform (split across client and server sides) and on the formalisms they are built with, usually a mixture of programming and formatting languages. This heterogeneity is perhaps an explanation why software evolution of web applications (apps) is a poorly addressed topic in the literature. In this paper we focus on web apps built with PHP, the most widely used server-side programming language. We analyzed the evolution of 6 code smells in 4 web applications, using the survival analysis technique. Since code smells are symptoms of poor design, it is relevant to study their survival, that is, how long did it take from their introduction to their removal. It is obviously desirable to minimize their survival. In our analysis we split code smells in two categories: scattered smells and localized smells, since we expect the former to be more harmful than the latter. Our results provide some evidence that the survival of PHP code smells depends on their spreadness. We have also analyzed whether the survival curve varies in the long term, for the same web application. Due to the increasing awareness on the potential harm-fulness of code smells, we expected to observe a reduction in the survival rate in the long term. The results show that there is indeed a change, for all applications except one, which lead us to consider that other factors should be analyzed in the future, to explain the phenomenon.


Motivation
This study is in the crossroads of Software Evolution [1,2] and Web Engineering [3,4] and is a follow-up of our preliminary work on the evolution of web systems/applications, regarding maintainability and reliability problems [5,6].Web applications (web apps, for short) encompass a heterogeneity of target platforms, since they run both on a browser and a server, and a mix of programming and content formatting languages.That mix makes web apps a more complex target for software quality studies, either synchronic or diachronic (longitudinal), because those studies are often based on static analysis of source code, as is the case with code smells research.Indeed, for each language used (e.g.JavaScript, html5, CSS, PHP) a different parser is required, what may explain why software evolution of web apps is a poorly addressed topic in the literature, as we will see later.
Several perspectives can be adopted in longitudinal studies, were the evolution of software products or processes are analyzed, focusing on aspects such as software metrics, teams' activity, defects identification and correction, or time to release [7,8].This paper addresses the survival of code smells in web apps using PHP, the main serverside programming language, currently used in 79% of web apps1 .Since code smells occurrences are symptoms of poor design and implementation choices, the Software Engineering community has been thriving to reduce their survival, i.e., how long does it take from when they were introduced to when they are removed, by proposing new detection techniques and tools [9,10].Despite this interest, there is a shortage of evolution studies on code smells in web apps, especially with PHP [11,12].
The research described herein covers two factors: scope and period.Code smells effect can vary widely in spreadness.In localized ones, the scope is a method or a class (e.g.Long Method, Long Parameter List, God Class), while the influence of others may be scattered across large portions of a software system (e.g.Shotgun Surgery, Deep Inheritance Hierarchies or Coupling Between Classes).Since widespread code smells can cause more damage than localized ones, we expect their survival rates to be shorter.The other factor we are concerned with regards a superordinate temporal analysis.Since the topic of code smells has been addressed by researchers, taught at universities and discussed by practitioners over the last two decades, we want to investigate whether this had an impact on their survival.We expect that, in a long term, an increased awareness has caused a more proactive attitude towards code smells detection and removal (through refactoring actions), thus leading to shorter survival rates.Summing up, we aim at testing the following two null research hypotheses: -H0x: Survival does not depend of the code smells scope.-H0y: Survival of a given code smell does not change over time period.
To test these hypotheses, we performed a longitudinal study encompassing 4 web apps, and 6 code smells, as surrogates of more scattered or localized scopes.
This paper is structured as follows: section 2 introduces the study design; section 3 describes the results of our data analysis; section 4 overviews the related work and section 5 outlines the major conclusions and identifies required future work.

2
Study design

Aim of the study
The aim of this work is to study the evolution/survival of code smells in web apps built with PHP in the server-side.We selected a set of applications from different domains, a set of 6 code smells, and collected data across all their stable development versions.

Applications sampling
Inclusion criteria: (i) fully blown web apps taken from the GitHub top listings, (ii) programmed with object-oriented style2 , (iii) covering a diversity of application areas.Exclusion criteria: (i) libraries, (ii) frameworks, (iii) web apps built using a framework.

Code smells sampling
We used PHPMD [13], the only open source tool, we are aware of, that is capable of detecting scattered and localized code smells in PHP.It supports 3 scattered code smells, so we chose the same number of localized ones, although more were available.

Code Smell Characterization Type ExcessiveMethodLength
The method does too much Localized

ExcessiveClassLength
The class does too much Localized

ExcessiveParameterList
The method has too many parameters Localized

DepthOfInheritance
The class is too deep in the inheritance tree Scattered

CouplingBetweenObjects
The class has too many dependencies Scattered

NumberOfChildren
The class has too many descendants Scattered

Design of the study
The workflow of our study (see Fig. 1) was fully automated by means of batch and PHP scripts.First, all versions of the selected web apps were downloaded from GitHub, except the alpha, beta, release candidates and corrections for old versions.Then, using the PHPMD tool, we extracted the location of the code smells from all versions and stored it in XML file format.After some format manipulation, the information on those XML files was stored in a database, to make it amenable for survival analysis.That includes the date when each code smell was first detected and, if that was the case, when it disappeared, either due to refactoring, or because the code where it was detected was (at least apparently) removed.Survival analysis took the input data from the database and stored the results in it too.Finally, a data completion step was performed, where results were exported to csv format, for interoperability with visualization tools.

Survival analysis
Survival analysis encompasses a set of statistical approaches that investigate the time of interest for an event to occur [14].The questions of interest can be the average survival time, and the probability of survival at a certain point in time.Also, we can calculate the Hazard function, the probability of the event to occur.The Kaplan-Meier estimator is a non-parametric statistic that allows us to estimate the survival function and it gives the probability that a given code smell will subsist/exist past a time t.The log-rank test is used to compare survival curves of two groups, in our case two types of code smells.It tests the null hypothesis that survival curves of two populations do not differ by computing a p-value (statistical significance).In our analysis we will consider a confidence interval of 95%, corresponding to an observed significance level (known as p-value) of 0.05 for the test hypotheses.

Data extraction and processing, and format for survival analysis.
To apply survival data analysis, we transformed the collection of detected code smells instances for each version of each web app, to a table with the "life" of each instance, including the date of its first appearance, removal date (if occurred) and a censoring with the following meaning: Censoring=1  the smell disappeared, usually due to a refactoring event; Censoring=0  the code smell is still present at the end of the observation period.
For replication purposes, the collected dataset is made available for other researchers 3 .

Results and Data analysis
To test the hypotheses, we used the R tool and the log-rank test [15].We performed two kinds of studies, using the log-rank test and 2 different co-variables.

Comparing the survival curves for different types of code smells
For this study we divided the smells in "Localized Smells" and "Scattered Smells".We then fitted the Kaplan-Meier curves and performed the log-rank to compute the p-value.For phpMyAdmin and opencart, the survival curves of scattered smells differ significantly from those of localized smells (see Fig. 2).For phpbb, the curves differ, but not with statistical significance (given the considered confidence interval).For dokuwiki, the study is inconclusive, since only 2 scattered code smells are removed.It is worth noticing that localized code smells are much more frequent targets for change than scattered code smells.This may be due to the lack of refactoring tools that support the automated removal of scattered code smells in PHP.

Comparing code smells survival curves for different time frames
Here we used the log-rank test, and created a co-variate "timeframe", with two values 1 and 2, 1 being the first half of historic data, and 2 the second half.For the first half, we truncated the variables of the study as if it was a sub-study ending in this period.In other words, we considered two independent observation periods.For all projects except phpbb, the survival curves of the 1 st timeframe differ significantly from the 2 nd one (see Fig. 3).In phpMyAdmin and dokuwiki the area under the code smells survival curves is smaller in the 2 nd timeframe, what seems to corroborates our expectation that, due to the increasing awareness on the potential harmfulness of code smells, we would observe a reduction in the code smells survival rate in the long term.However, for opencart and phpbb, we cannot draw the same conclusion, probably because the number of detected and removed code smells is much smaller.

Related work
In [11], the authors study five web apps in PHP, the aspects of their history, unused code, the removal of functions, the use of libraries, the stability of interfaces, migration to object-orientation and the evolution of complexity.They found these systems undergo systematic maintenance.
In [12], the authors analyze 30 PHP projects for their metrics, to examine if the Lehman's laws of evolution are maintained in an web app, and found that not all of them stand true for this kind of projects.
In [16], the authors analyze two Java open source systems and four code smells, in a longitudinal study with versions.They study the evolution of code smells, including their persistence, and do survival analysis to find the average time of persistence.
In [17], after a survey to developers, the authors analyze when test smells occur in source code, what their survivability is, and whether their presence is associated with the presence of design problems in production code (code smells).They found, among other conclusions, relationships between test and code smells.They extracted data from a Git repository from 3 ecosystems making a total of 152 projects.They employ, among other techniques, survival analysis to study the longevity of test smells.
In [18] the authors address the faults in the releases of five JavaScript projects (1 framework, 2 libraries and 2 command line programs) and try to relate them to 12 types of code smells.They employ survival analysis for the faults.
The techniques used to extract the code for further analysis are divided in mining software repositories and getting full versions or mining Git repositories for the changes.However, to compute scattered code smells, we had to deal with the full code.For smells based only on metrics for the file, the Git extraction is simpler to automate.

Conclusions
As far as we know, this is the first study analyzing code smells in PHP web apps.We studied the evolution of 6 groups of code smells, in 4 web applications built with PHP.
For the first hypothesis we found that the code smells survival curves indeed vary with the type of code smell, localized or scattered.We also found that the insertion and removal events are much lower for the scattered code smells.Nevertheless, for PHP web apps developed with object-oriented paradigm, it is not enough to study localized smells in the file scope, but we also must address the smells that are scattered across the system, since the latter are potentially more harmful.
For the second hypothesis results, in 3 of the applications the survival curve varies between timeframes, but for one (phpbb) we do not observe this variance.This can be explained by the low relative removal of code smells compared to the other applications.We cannot fully sustain that the long-term reduction in survival rate of code smells in PHP web apps is due to the awareness on potential code smells harmfulness.Other factors should be analyzed in the future, to explain the phenomenon.
The biggest validity threats are: there should be a bigger number of apps and smells studied; perceive if applications that implemented OOP gradually, impact the study.

Table 1 .
Characterization of the target web apps

Table 3 .
Code smells found, removed and survival in days (median and mean), by type.
Fig. 2. Survival curves of localized and scattered code smells, by application.

Table 4 .
Code smells found, removed and survival in days (median and mean) by timeframe.Code smells survival curves in two consecutive timeframes (1 st and 2 nd half)