Coding Style
This page describes the preferred coding style for LEMMA. To quote Linus Torvalds in the Linux Kernel Documentation:
Quote
Coding style is very personal, and I won't force my views on anybody, but this is what goes for anything that I have to be able to maintain, and I'd prefer it for most other things too. Please at least consider the points made here.
To rephrase: We greatly welcome all kinds of contributions to LEMMA and its various modules, and hereby encourage all contributors to employ the coding style outlined as follows, even if it contradicts your personal style of programming. That is, because we perceive consistency in source code appearance crucial to ensure a codebase's readability and comprehensibility. Thank you!
Scope
This style guide concerns all programming languages used in LEMMA modules besides
- Python, for which we adhere to PEP 8; and
- Bash scripts (at least partially and as described below).
Indentation
Except for ATL modules (files with the .atl
extension), we
use spaces instead of tabs to indent LEMMA's source code. An indentation
level is introduced by 4 spaces. In ATL modules,
we use tabs (instead of spaces) with column width 4
for indentation, which is mainly due to historical reasons.
Except for package/module definitions and import statements, a line of code should not exceed 100 columns. Moreover, the maximum count of indentation levels should be 3. If you require more indentation levels, please refactor your code by introducing, e.g., additional data structures or helper methods. For instance, Xtend code like
1 2 3 4 5 6 7 8 9 10 11 12 |
|
should be refactored into something like
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
switch
statements are an exception to the rule of the maximum indentation
level 3. For switch
statements we accept a maximum indentation level of 4.
Thus, it is perfectly fine to refactor the first Xtend listing above into
something like
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Each line of code should contain exactly one statement, e.g.,
1 2 3 4 5 6 7 8 9 10 |
|
Braces
As in K&R style, opening curly braces should be put last on the same line as the statement to which they belong. Divide the statement and the opening curly brace by a single space. Conversely, closing curly braces should be put on their own line:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
|
Spaces and Newlines
Spaces
There should be one space
- after keywords like
if
,case
,do
,for
, andwhile
; - on each side of binary and ternary operators like
=
,+
,-
,<
,>
,*
,/
,%
,|
,&
,^
,<=
,>=
,==
,!=
,?:
; - after statements that are followed by a block (cf. the Braces example listing).
There should be no space
- after keywords like
catch
andswitch
; - before parameter lists of methods/functions;
- after unary operators like
!
; - before postfix increment and decrement unary operators, i.e.,
++
and--
; - after prefix increment and decrement unary operators, i.e.,
++
and--
; - around operators that access members of classes/data structures, e.g.,
.
or->
.
Remove spaces that do not comply with the above rules whenever possible, e.g.,
- in empty lines,
- at the end of a line,
- at the end of a file.
We never require horizontal alignment as the resulting maintainability effort does not justify the possible (and usually small) gain in readability:
1 2 3 4 5 6 7 8 9 |
|
Newlines
Add one empty line between the definition of semantically or syntactically coherent code blocks so that they are recognizable as such. For example, there should be one empty line between a block of constants and the following block of attributes in a class (semantic cohesion). Similarly, there should be one empty line between the end of a method body and the start of the next method's signature (syntactic cohesion).
We do not place a newline at the end of a file.
Optional Syntax Constructs
Omit optional constructs as permitted by the respective programming language as much as possible to decrease the amount of characters one has to read to understand your code. For example, leave out round braces for functions in Xtend's standard library and semicolons when they are not necessary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
An exception to the rule of omitting unnecessary constructs are code blocks that
are empty by intent, e.g., empty catch
-blocks or constructor bodies. To
communicate the intent of leaving such blocks empty, you should add (i) an
explicit comment describing why the respective code block is empty
(for catch
-blocks and non-trivial constructors); or (ii) an explicit "NOOP"
line comment (for trivial constructors):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|
Breaking Long Lines
Basic Formatting
Lines wider than 100 columns should be broken into sensible chunks, unless exceeding 100 columns significantly increases readability and does not hide information. Descendants should follow LEMMA's coding style w.r.t. indentation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Functions With Long Parameter Lists
Functions/methods, whose parameter lists exceed 100 columns, and that
- have more than two parameters; and
- breaking parameters into descendant lines result in at least one descendant line comprising only one parameter, which is not the last one
should be formatted so that each parameter definition is in its own line. In addition, the closing round brace of the parameter list and the opening curly brace of the function/method body should go on their own line on the same indentation level as the start of the function/method definition:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
Binary Operators
When breaking long statements, binary operators should remain on the previous line:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
In case there are more than two binary operators in an expression that exceeds 100 columns, each operand should go in its own line, possibly followed by the binding operator:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Assignments and Member Accesses
For assignments and members accesses, the respective operators should go into the same line as the assigned value or the accessed member, respectively:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
|
Naming
Except for Python scripts, where we adhere to the style suggested by PEP 8, and Bash scripts, where we follow the Linux Kernel style of Naming, the following naming rules shall be applied to all LEMMA module implementations.
Basic Formatting
Names should be kept readable but also as short as possible. Suppose a method
that allows the concatentation of an arbitrary number of strings into a single
file path string. A proper name for this method would be joinPathSegments
rather than join
(too generic), jps
(incomprehensible), or
joinPathSegmentsFromAnArbitraryNumberOfStrings
(too long without a reason,
i.e., the FromAnArbitraryNumberOfStrings
suffix does not add any significant
information to the joinPathSegments
name prefix).
Modules/Packages
Names of modules/packages shall follow the all_lowercase_with_underscore
style. For example, a valid package name is
de.fhdo.lemma.model_processing.code_generation.container_base
, and not
de.fhdo.lemma.model_processing.code_generation.containerBase
or (even worse)
de.fhdo.lemma.model_processing.code_generation.ContainerBase
.
Classes
Names of classes shall follow the UpperCamelCase
style. For the conversion of
English phrases into upper camel-case, we follow the
conversion scheme defined in Google's Java Style Guide:
- Convert the phrase to plain ASCII and remove any apostrophes. For example, "Müller's algorithm" becomes "Muellers algorithm".
- Divide this result into words, splitting on spaces and any remaining punctuation (typically hyphens). If any word already has a conventional camel-case appearance in common usage, split this into its constituent parts (e.g., "AdWords" becomes "ad words"). Note that a word such as "iOS" is not really in camel-case per se.
- Now lowercase everything (including acronyms), then uppercase only the
first character of each word, to yield
UpperCamelCase
. - Finally, join all the words into a single identifier.
Examples:
Prose form | Right | Wrong |
---|---|---|
"XML HTTP request" | XmlHttpRequest |
XMLHTTPRequest |
"YouTube importer" | YouTubeImporter |
YoutubeImporter |
Variables and Functions/Methods
Names of variables and functions/methods shall follow the lowerCamelCase
style. For the conversion of English phrases into lower camel-case, we follow the
conversion scheme defined in Google's Java Style Guide:
- Convert the phrase to plain ASCII and remove any apostrophes. For example, "Müller's algorithm" becomes "Muellers algorithm".
- Divide this result into words, splitting on spaces and any remaining punctuation (typically hyphens). If any word already has a conventional camel-case appearance in common usage, split this into its constituent parts (e.g., "AdWords" becomes "ad words"). Note that a word such as "iOS" is not really in camel-case per se.
- Now lowercase everything (including acronyms), then uppercase only the
first character of each word except the first, to yield
lowerCamelCase
. - Finally, join all the words into a single identifier.
Examples:
Prose form | Right | Wrong |
---|---|---|
"new customer ID" | newCustomerId |
newCustomerID |
"inner stopwatch" | innerStopwatch |
innerStopWatch |
"supports IPv6 on iOS?" | supportsIpv6OnIos |
supportsIPv6OnIOS |
As opposed to hungarian notation, we do not encode the type of a variable in its name:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Constants
Constants are an exception to the lowerCamelCase
rule for variables. Their
names shall follow the ALL_UPPERCASE_WITH_UNDERSCORE
style:
1 2 3 4 5 6 7 |
|
Avoidance of Offensive Terms
Avoid introducing the terms master
and slave
as well as blacklist
and
whitelist
. We consider them offensive. Instead, you can use these
alternatives:
- for
master
/slave
:primary
,main
/secondary
,replica
,subordinate
initiator
,requester
/target
,responder
controller
,host
/device
,worker
,proxy
leader
/follower
director
/performer
- for
blacklist
/whitelist
:denylist
/allowlist
blocklist
/passlist
Source File Language, Encoding, and Ordering
Language
The language for all source file contents is American English. An exception to this rule might be localized status messages or texts in user dialogs. However, American English is also the first language to consider for messages or dialog texts.
Encoding
Source file encoding is UTF-8
.
Ordering
The ordering of source file contents should be logical (and, in particular, not chronological or based on visibility). For instance, variables should be defined closely to their first usage and the order of methods should reflect their call chains. Consider the following example of a badly ordered Xtend class:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
|
The contents of this class should instead be ordered as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Imports are ordered as follows:
- All non-static imports in a single block.
- All static imports in a single block.
Comments
Basics
Try not to over-comment your code and aim to write your code in a way that it is understandable without comments. In any case, do not use comments to explain HOW your code works but WHAT it does:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
|
Proper names in comments, e.g., "Eclipse", "IFile", "LEMMA", and "Spring Boot", are capitalized as intended by the inventors/providers of the mentioned frameworks/products/technologies/entities.
In addition, we never (ever) hyphenate within comments.
JVM Specifics
Try also to avoid comments in method or function bodies to the maximum extent possible. However, we almost always place comments before the signature of a function/method and its defining class. Exceptions to this rule are trivial functions/methods like getters or setters, and trivial classes like POJOs. For functions/methods and classes in JVM languages use the following comment style:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
|
In case you developed a JVM function/method that consists of several logical steps, whose decomposition into other methods does not make sense, you may use multi-line comments to separate coherent steps and single-line comments within a compound of coherent steps. However, this form of commenting (and organizing functions/methods) should be avoided at all sane costs.
Python Specifics
For Python scripts, we use the comment style suggested by PEP 8.
ATL Specifics
For ATL modules, use ---
to comment global variables, rules, or helpers, and
--
to comment everything else.
Programming Practices
Always Use @Override
While languages like Kotlin and Xtend require you to use the override
keyword
when overriding inherited methods, Java does not enforce the usage of the
@Override
annotation. However, you should add @Override
to Java methods
whenever they override an inherited method.
Access Static Members By Their Defining Classes
When accessing static members in a qualified manner, use the defining class's name and not an instance of the defining class:
1 2 3 4 5 6 7 8 9 10 |
|
Do Not Reinvent Utility Functions
LEMMA comes with an extensive set of utility functions, e.g., in the
LemmaUtils
class
or Xcore metamodel specifications. Please use such functions
rather than explicitly coding some variant of them yourself.
Lowest Applicable Visibility
In languages that support a notion of visibility for module members, we always
employ the lowest level of visibility applicable to a member concerning its
intended usage. For Java, for example, apply visibility in the following order
(i.e., make classes package-private whenever possible and make their members
private
whenever possible):
private
(class members only)- no modifier, i.e., package-private visibility
protected
(class members only)public
Commits
Commit Cohesion
Your commits should always reflect coherent changes to LEMMA's codebase. For
example, you should not add two different major functionalities in the same
commit. However, a commit must always result in a functioning (compileable
and buildable)
main
branch and not break any tests. In you own feature branches, your commits might
introduce compile and/or build failures, which is totally fine as long as the
commits are isolated from main
. However, before
you open a pull request for main
, you must ensure
that your commits will result in a functioning main
branch.
Merges of commits into main
are expected to be
fast-forward (in fact, they are done using
git's --ff-only
option for the merge
sub-command).
Consequently, you should
rebase your feature branch onto main
before creating a pull request on GitHub.
In case your commit results in generated code (as is the case, e.g., with Xtend), only commit generated code that actually results from the changes within source code files relevant to the commit. For instance, in case Eclipse performs a workspace re-build and generates Java code from Xtend files all over the place, your commit should only comprise those generated Java files that immediately result from changes to Xtend files in the LEMMA module that is actually affected by your commit.
In addition, a commit should be revertable without breaking existing code whenever possible. Sometimes this approach might, however, not be achievable in a sane manner. We accept such cases then. Still, you should try to create revertable commits as much as possible.
Commit Messages
Commit messages in LEMMA consist of three parts:
- The name of the LEMMA module that is affected by the commit in American English.
- A colon that separates the module name from the commit description.
- The commit description in American English.
LEMMA Module Names
Currently, we do not have a standardized collection of LEMMA module names.
However, you might execute the following Bash command on LEMMA's
main
branch in a cloned copy of LEMMA's repository
to get a glimpse of common module names:
1 |
|
This command will result in a list comprising entries like
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
which are all good candidates for module names. Please capitalize nouns in module names, e.g., "Intermediate Metamodels" is preferred over "Intermediate metamodels".
In case your commit concerns more than one LEMMA module, you might separate the module names with commas and an "and", e.g.,
1 2 3 4 |
|
If your commit does not concern a LEMMA module or all LEMMA modules, omit module names entirely, e.g.,
1 2 3 4 5 6 |
|
Colon Separator
In case your commit affects one or more LEMMA modules, place a colon immediately after the module names. A space after the colon shall then separate the colon from the commit description:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
|
Commit Description
Commit descriptions shall start with a verb in active form for the first person (I/we). Moreover, the description shall consist of at least one sentence in American English. The verb at the beginning of a commit message starts with an uppercase letter. All other words of the commit message follow regular capitalization in American English. In particular, proper names are capitalized the same way as within comments:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
While the latter message is basically correct, we nowadays prefer "long git commit messages" for commit descriptions that consist of more than one sentence. The template for a "long git commit message" is as follows:
1 2 3 |
|
The [LONG_MESSAGE]
template variable is a block of text with a maximum
width of 100 columns, in which each line is
indented by 4 spaces. Consequently, the latter
example commit message above should be reformatted as follows:
1 2 3 |
|
A real-world example of a "long git commit message" can be found in LEMMA commit 9237b941:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
To gain further insights on short LEMMA commit descriptions, you may execute
the following Bash command on LEMMA's main
branch in a
cloned copy of LEMMA's repository:
1 2 |
|