Cloning as a software engineering tool

Speaker: Michael W. Godfrey, University of Waterloo

What is a clone

cloning vs similarity

intent or co-incidence

no standard definition

Measuring detection effectiveness

precision vs recall

set tools for more hits, then perform customized filtering to remove common false positives

plagiarism detection, IP theft

DNA sequence analysis

compression

spam analysis, malware detection

Detection methods

strings

tokens

ASTs

PDGs: program dependence graph

meta techniques: metrics, data mining, lightweight semantic analysis

see Roy and Cordy’s tech report

Clone analysis

what techniques to study large systems?

what kind of clones?

what properties do they have?

does it hurt design in the long run?

Why cloning is supposed to be bad

leads to bloat

sign of inexperienced developers

sign of poor design

Formula, repetition, duplication

replication of trusted design elements works in software

prototype design pattern

Cloning considered harmful

best paper in Working Conference on Reverse Engineering

Forking: clones evolve independently, eg, platform variation (OS specific, architecture specific code)

Templating: limitation of implementation language, eg, C routines handling int and floats in the same way

Customizing: existing code solve a similar problem but you can’t or won’t change it

Melvin's digital garden

Cloning as a software engineering tool

What is a clone

Measuring detection effectiveness

Detection methods

Clone analysis

Why cloning is supposed to be bad

Formula, repetition, duplication

Cloning considered harmful

Links to this note

Cloning as a software engineering tool

What is a clone

Measuring detection effectiveness

Related problems

Detection methods

Clone analysis

Why cloning is supposed to be bad

Formula, repetition, duplication

Cloning considered harmful

Links to this note