Computer Programming/Coding Style
In computer programming, there are numerous coding conventions used to ensure consistent code, and enhance code quality, notably including correctness, legibility, maintainability, and speed. Individual projects, communities, bodies of code, and guidelines choose particular conventions, forming coding standards or style guides. “Programming style” primarily refers to low-level conventions, such as formatting or choice of language constructs (such as goto
or return
), but can also refer to large-scale code structure or even overall design in software engineering. These higher level style topics are often referred to as a “philosophy”, as in the Unix philosophy.
While some conventions are widely held as superior to other choices, others have a few common alternatives, each with advantages and disadvantages, and different standards make different choices. Consistency is a general value, and thus even if another choice might be better in a specific case, the cost to consistency of making an exception often outweighs the benefit. However, exceptions are sometimes made if particularly compelling or in prescribed circumstances, and different standards place a stronger or weaker emphasis on consistency.
Beyond specifying good practices to follow, coding style also identifies bad practices, known as “anti-patterns” or “code smell”, and may recommend specific solutions. This guide discusses language-neutral issues, listing pros and cons of various conventions.
Contents
[edit | edit source]Appearance
[edit | edit source]Code structure
[edit | edit source]- /Auxiliary variables
- /Cleanup
- /Concision
- /Control structures
- /Minimize nesting
- /Modularity
- /Simple statements
- /Structured programming
Non-code
[edit | edit source]Details
[edit | edit source]Examples
[edit | edit source]Left-hand comparisons
[edit | edit source]In languages which use one symbol (typically a single equals sign, (=
), e.g. Visual Basic) for assignment and another (typically two equals signs, (==
) for comparison (e.g. C/C++, Java, ActionScript 3, PHP, Perl numeric context, and most languages in the last 15 years), and where assignments may be made within control structures, there is an advantage to adopting the left-hand comparison style: to place constants or expressions to the left in any comparison.
[1]
[2]
Here are both left and right-hand comparison styles, applied to a line of Perl code. In both cases, this compares the value in the variable $a
against 42, and if it matches, executes the code in the subsequent block.
if ($a == 42) { ... } # A left-hand comparison checking if $a equals 42.
if (42 == $a) { ... } # Recast, using the right-hand comparison style.
The difference occurs when a developer accidentally types =
instead of ==
:
if ($a = 42) { ... } # Inadvertent assignment which is often hard to debug
if (42 = $a) { ... } # Compile time error indicates source of problem
The first (left-hand) line now contains a potentially subtle flaw: rather than the previous behaviour, it now sets the value of $a
to be 42, and then always runs the code in the following block. As this is syntactically legitimate, the error may go unnoticed by the programmer, and the software may ship with a bug.
The second (right-hand) line contains a semantic error, as numeric values cannot be assigned to. This will result in a diagnostic message being generated when the code is compiled, so the error cannot go unnoticed by the programmer.
Some languages have built-in protections against inadvertent assignment. Java and C#, for example, do not support automatic conversion to Boolean for just this reason.
The risk can also be mitigated by use of static code analysis tools that can detect this issue.
Looping and control structures
[edit | edit source]The use of logical control structures for looping adds to good programming style as well. It helps someone reading code to better understand the program's sequence of execution (in imperative programming languages). For example, in pseudocode:
i = 0 while i < 5 print i * 2 i = i + 1 end while print "Ended loop"
The above snippet obeys the naming and indentation style guidelines, but the following use of the "for" construct may be considered easier to read:
for i = 0, i < 5, i=i+1 print i * 2 print "Ended loop"
In many languages, the often used "for each element in a range" pattern can be shortened to:
for i = 0 to 5 print i * 2 print "Ended loop"
In programming languages that allow curly brackets, it has become common for style documents to require that even where optional, curly brackets be used with all control flow constructs.
for (i = 0 to 5) { print i * 2; } print "Ended loop";
This prevents program-flow bugs which can be time-consuming to track down, such as where a terminating semicolon is introduced at the end of the construct (a common typo):
for (i = 0; i < 5; ++i);
printf("%d\n", i*2); /* The incorrect indentation hides the fact
that this line is not part of the loop body. */
printf("Ended loop");
...or where another line is added before the first:
for (i = 0; i < 5; ++i)
fprintf(logfile, "loop reached %d\n", i);
printf("%d\n", i*2); /* The incorrect indentation hides the fact
that this line is not part of the loop body. */
printf("Ended loop");
Nesting
[edit | edit source]An alternate and more traditional style is to explicitly indicate the nesting. This means that the opening and closing braces are in the same column:
for( index = 0 ; index < size ; ++index )
{
arrayA[ index ] = arrayB[ index ] ;
}
This clearly indicates both the start and the end of the block, so it is easy to pick out from the code.
Indicating Relationships
[edit | edit source]Relationships are indicated in a number of ways. The simplest is the association of keywords and function names with their arguments. This is indicated by placing the keyword or function name directly next to the argument.
if( a == b )
{
...
}
next is the arguments themselves. Where there are multiple arguments in conditional statements organising these horizontally and vertically in a lined up fashion indicates their relationship.
if( ( a == b ) ||
( a == c ) )
{
...
}
Relationships extend all the way up - relationships between functions, between components and between modules. They can be of varying types but they all need to be indicated clearly.
The indicating of relationships is very important as it is part of connecting the dots - part of showing how the code hangs together. When modules are being constructed it is very important to design them in such a way that their structure, in relationship terms, can be easily understood by starting at a single obvious starting point and following the code from that point.
Spreading Code Out and Lining it Up
[edit | edit source]This is very important for readability. Basically the principle is to:
- separate each component part by white space.
- align everything in a meaningful way.
As such one can easily scan up and down the code and see the patterns. This is very important not only for understanding the code, but also for looking for anomalies and as a tool for rationalising and consolidating the code.
Code that has a lot of 'noise' - a lot of unnecessary variation and untidiness - is code that one can waste a lot of time working on. Well written and formatted code is code that is easy and quick to work with. It is code that allows one to easily 'see the wood from the trees'.
Scrunched Up Code
for(i=0;i<s;i++){a[i]=b[i];}
Separated Out Code
for( index = 0 ; index < size ; index++ )
{
arrayA[ index ] = arrayB[ index ] ;
}
Clean formatting and meaningful names make the code more readable and easier to understand.
Meaningfulness and Consistency
[edit | edit source]Meaningfulness is coding in a manner that conveys meaning. If a for loop uses just a letter 'i' for the index this is not very meaningful. If, however, the word 'index' is used this is much more meaningful.
Consistency is using the same name wherever the same type of situation occurs. For example if different words are used for 'index' - eg. 'i' , 'index' , 'inx' 'indx' - this is not being consistent - it's one of the many areas where unnecessary 'noise' is introduced into the coding. If, however, 'index' is consistently used then that produces code that is much easier and quicker to read and comprehend.
It's very important to minimise the number of unknowns and maximise the number of knowns.
Hard and Soft Coding
[edit | edit source]Hard coding - often called 'magic numbers' - produces code that is difficult to maintain. The code loses the understanding of what the numbers represent and where they occur. The hard coding should be replaced by enumerated values or by #defines ( soft coding ). Most compilers can handle enumerated values as non typed numbers and, as such, this is a safer method than using macros - #defines. A soft coded project can be very quickly and safely updated. A hard coded project can be very time consuming and very dangerous to update.
Lists
[edit | edit source]Where items in a list are placed on separate lines (a vertical list), it is sometimes considered good practice to add the item-separator after the final item, as well as between each item – most often this is a comma, so it is also known as using trailing commas. For example in C:
const char *array[] = {
"item1",
"item2",
"item3", // still has the comma after it
};
This ensures that each line is a separate item, regardless of order or whether another item follows it. This eliminates the need to add a comma to the line which was previously last in the list, or remove a comma from the new last item, when the list items are reordered or items are added to or remove from the end. Beyond reducing tedium and preventing syntax errors, it has two subtler benefits. Firstly, when using revision control, line differences between two file versions will show only insertions and deletions (and reorderings) of items, without an extra line for adding or removing a trailing comma.[3] Secondly, in some languages, such as Python, adjacent string literals are concatenated, so a missing comma in the middle of a list, rather than causing a syntax error, will instead cause two adjacent items to be concatenated, which can be a subtle bug to catch.
This is supported in some constructs in some languages, such as arrays (lists) in C, Java, and Python. In other cases trailing commas are a syntax error, or extend the list with a null entry at the end. Even languages that do support trailing commas, not all list-like syntactical constructs in those languages need support it. A notable example of a language changing is ECMAScript: version 3 did not allow trailing commas – though this was only enforced in Internet Explorer – while version 5 allows an optional trailing comma. There are occasional subtleties. For example, in some FORTRAN dialects, a trailing comma is interpreted as an additional null argument, and then an empty argument list FOO()
is considered as a single null argument; an empty argument list therefore cannot be passed. Many systems handle this dialect difference by allow a single excess argument to be passed.[4] In Python, trailing commas are allowed in tuples (among other types), and a 1-tuple is defined by an expression with a trailing comma, as in 1,
– parentheses are allowed, yielding (1,)
(1-tuple) instead of a simple parenthesized expression (1)
but it is the commas (here trailing) which define the tuple; here an empty tuple is defined by empty parentheses ()
.[5][6]
Use of trailing commas in lists can be compared to the use of semicolon as a statement terminator in many languages, where statements are generally written on separate lines and a trailing semicolon is allowed or required. This contrasts with ALGOL and early forms of Pascal, where semicolons are strictly statement separators, and a trailing semicolon is illegal; see Pascal: Semicolons as statement separators.
Trailing commas can also be used in horizontal lists, such as f(x, y,)
which has the same benefit of making lists easier to modify, as in vertical lists, but this is generally considered unsightly and is less common.
Trailing commas are generally handled in the parser, as part of the phrase grammar. A similar phenomenon is semicolon insertion, which is generally done during lexical analysis. However, in some cases these are combined – ECMAScript features semicolon insertion (in the lexer), but also optional trailing semicolons in some phrase production rules.
Commenting and Documentation
[edit | edit source]Commenting the code is very important. People can't read other people's minds. Commenting serves two purposes:
- to ensure that the code is written as desired. This form of comment is usually put in the code before the code is written. It serves as a means of ensuring that the code does what it is supposed to do.
- to ensure that the code is described in sufficient detail such that anyone else looking at the code can easily understand the design and the purpose of the code.
Documentation is very important. This is separate from but associated with the code. It consists of documents such as specifications, design descriptions, tests and test results.
See also
[edit | edit source]Bibliography
[edit | edit source]- GNU coding standards
- Linux kernel coding style
- FreeBSD coding style
- google-styleguide: Style guides for Google-originated open-source projects
- The Elements of Programming Style, Brian W. Kernighan and P. J. Plauger, (1974, Second Edition 1978, ISBN 0-07-034207-5
- The Practice of Programming, by Brian W. Kernighan and Rob Pike, Addison-Wesley, Inc., 1999, ISBN 0-201-61586-X.
Please add {{alphabetical}}
only to book title pages.
References
[edit | edit source]- ↑
Sklar, David (2003). PHP Cookbook. O'Reilly.
{{cite book}}
: Unknown parameter|coauthors=
ignored (|author=
suggested) (help), recipe 5.1 "Avoiding == Versus = Confusion", p118 - ↑
"C Programming FAQs: Frequently Asked Questions". Addison-Wesley, 1995. Nov. 2010.
{{cite web}}
: Check date values in:|date=
(help) - ↑ "Why have trailing commas in resources?", Puppet Cookbook, Dean Wilson
- ↑ 9.9.4 Ugly Null Arguments
- ↑ 5.3. Tuples and Sequences
- ↑ TupleSyntax