software design and architecture stack
what is clean code
- the term, clean code refers to code that’s easy to read, understand,
and maintain
- popularized by Robert C. Martin
- guideline how to write readable, understandable, and maintainable code
- although not every “rule” applies to every language / situation
why it matters
misconception: program code is mostly written, while in reality it is mainly read
a well-written code is easy to read, understand, debug, maintain, extend, etc.
Any fool can write code that a computer can understand. Good programmers write code that humans can understand.
– Martin Fowler (Fowler, 2018)
- where human can be the future you, current or future colleague…
- like having a nice handwriting
clear is better than clever
In any kind of programming, clarity should always be the primary goal.
Writing code that is straightforward and understandable is more valuable than trying to craft overly clever or intricate solutions.
– Go Proverbs by Rob Pike
source: Rob Pike’s Go Proverbs
prime check – clever vs. clear
func isPrime(n int) bool {
if n < 2 {
return false
}
for i := 2; i*i <= n; i++ {
if n%i == 0 {
return false
}
}
return true
}
func isPrime(n int) bool {
if n < 2 {
return false
}
for i := 2; i <= n/2; i++ {
if n%i == 0 {
return false
}
}
return true
}
less efficient, but it may be easier for a reader to understand
source: Rob Pike’s Go Proverbs
func isPrime(n int) bool {
if n < 2 {
return false
}
for i := 2; i <= int(math.Sqrt(float64(n))); i++ {
if n%i == 0 {
return false
}
}
return true
}
- iterating only up to the square root of n because if n has any divisor greater than its square root, there must also be a corresponding divisor smaller than the square root
- it is easier to understand than the “clever one”, but less wasteful then the “clear one”
- why is the clever on better?
- because multiplication is more efficient for the CPU than calculating the square root
hierarchy in style guides
- language level:
- Python: PEP 8 or pep8.org
- Ruby: Ruby Style Guide
- Go: Effective Go
- Rust The Rust Style Guide
- etc.
- organization level:
not just style guides, also best practices
write idiomatic code
- a prog. language implements a prog. paradigm
- a paradigm defines a certain “way” of writing code
- using different abstractions / building blocks
- promoting a given concept
- some languages implements multiple paradigms
- and languages have their own way of doing things
- languages have pros and cons for a given problem
just as in the case of natural languages, you ought to use a language properly
write idiomatic code
for (i = 0; i < 10; i++) {
console.log(i);
}
[...Array(10).keys()].forEach(i => {
console.log(i);
});
i = 0
while i < 10:
print(i)
i += 1
for i in range(10):
print(i)
for i in 0..9 do
puts i
end
(0..9).each do |i|
puts i
end
(0..9).each {|i| puts i}
clean code
Clean Code: A Handbook of Agile Software Craftsmanship
by Robert C. Martin (2009) (Martin, 2009)
meaningful names
this section is based on the book Clean Code (chapter 2) by Robert C. Martin (Martin, 2009)
with own examples
use intention-revealing names
int d; // elapsed time in days
the definition is only available ad the declaration
int elapsedTimeInDays;
the definition is available at every usage
multi-word names
camelCase
int elapsedTimeInDays;
- C (local variable)
- Java (variable, method)
UpperCamelCase (PascalCase)
public class DataCollector {}
- Java (class)
- Rust (Type, Enum)
snake_case
elapsed_time_in_days = 17
- Python
- Rust (variable, function)
a study states, camelCase is faster to type but snake_case is faster to read (Sharif & Maletic, 2010)
read the style guide
avoid disinformation
Do not refer to a grouping of accounts as an
accountListunless it’s actually aList(Martin, 2009).
better to use accounts, it does not depend on the collection name
inconsistent spelling is also disinformation
disinformative names would be the use of lower-case
Lor uppercaseO(Martin, 2009)
- they can look almost like the one and zero, respectively – use the right font
- PEP8 (Python style guide) forbids to use them
make meaningful distinctions
It is not sufficient to add number series or noise words, even though the compiler is satisfied. If names must be different, then they should also mean something different (Martin, 2009).
def calculate_distance(data: pd.DataFrame) -> pd.Series:
# do something
def calculate_distance2(data: pd.DataFrame) -> pd.Series:
# do something else
def calculate_eucledian_distance(data: pd.DataFrame) -> pd.Series:
# ...
def calculate_levenshtein_distance(data: pd.DataFrame) -> pd.Series:
# ...
make meaningful distinctions / noise words
Noise words are another meaningless distinction. Imagine that you have a
Productclass. If you have another calledProductInfoorProductData, you have made the names different without making them mean anything different (Martin, 2009).
use pronounceable names
If you can’t pronounce it, you can’t discuss it without sounding like an idiot (Martin, 2009).
- Should
etidbe an integer? - Should
elapsed_time_in_daysbe an integer?
could be especially important for non-native speakers as some words are more difficult to pronounce
use searchable names
Single-letter names can ONLY be used as local variables inside short methods. The length of a name should correspond to the size of its scope (Martin, 2009).
it’s OK to do this:
for i in range(10):
print(i)
it’s NOT OK in a large scope:
int d; // elapsed time in days
names for classes, functions
- a class is a model / blueprint of something
- the name should be a noun
- e.g.,
User,Activity
- e.g.,
- an object is an instance of a class
- still a noun
- e.g.,
user = User()
- a function does something
- the name should contain a verb
- in imperative
- e.g.,
aggregate_activity activity_aggregation
avoid encodings
with modern IDEs it is pointless to put type or role markers into names
Hungarian notation
- invented by Charles Simonyi at Microsoft
- adding a prefix to a name that gives information about type, length, or scope
def fnFactorial(iNum):
if iNum == 1:
return iNum
else:
return iNum * fnFactorial(iNum - 1)
source: (Bhargav, 2024)
interface IShapeArea // I is also a prefix
{
void area();
}
interface ShapeArea
{
void area();
}
avoid mental mapping
Readers shouldn’t have to mentally translate your names into other names they already know (Martin, 2009).
don’t pun or use humor
- no inside jokes
- no colloquialisms or slang
- be objective and professional
Say what you mean. Mean what you say (Martin, 2009).
pick one word per concept
it’s confusing to have
fetch,retrieve, andgetas equivalent methods of different classes (Martin, 2009)
it also helps to search for the term
add meaningful context
Imagine that you have variables named firstName, lastName, street, houseNumber, city, state, and zipcode. Taken together it’s pretty clear that they form an address. But what if you just saw the state variable being used alone in a method? (Martin, 2009)
- adding a prefix?
- e.g.,
addrCity,addrStreet,addrState
- e.g.,
- as notations are discouraged, use an
Addressclass instead to add context
functions
this section is based on the book Clean Code (chapter 3) by Robert C. Martin (Martin, 2009)
with own examples
functions should be as small as possible
Functions should hardly ever be 20 lines long (Martin, 2009)
- shorter functions are easier to understand
do one thing (single responsibility principle)
import sqlite3
import pandas as pd
con = sqlite3.connect("data.db")
data = pd.read_sql(activity_query, con)
records = []
for woy in range(36, 40):
for dow in range(1, 8):
records.append([woy, dow, 0])
empty = pd.DataFrame.from_records(
records, columns=["week_of_year", "day_of_week", "count"]
)
data = (
pd.concat([data, empty])
.drop_duplicates(subset=["week_of_year", "day_of_week"], keep="first")
.sort_values(["week_of_year", "day_of_week"])
.reset_index(drop=True)
)
activity = pd.pivot(
data, index=["week_of_year"], columns=["day_of_week"], values=["count"]
).values
res = con.execute(progress_query)
progress = res.fetchone()[0]
SELECT
CAST(
strftime('%W', timestamp)
AS INTEGER
) AS week_of_year,
CAST(
strftime('%u', timestamp)
AS INTEGER
) AS day_of_week,
count(*) AS count
FROM activity
WHERE
user_id = 42 AND
week_of_year > 35 AND
week_of_year < 40
GROUP BY
week_of_year,
day_of_week;
SELECT
lesson / 50.0 AS progress
FROM activity
WHERE
user_id = 42 AND
result = 'success'
ORDER BY lesson DESC
LIMIT 1;
debug tables
| week_of_year | day_of_week | count |
|---|---|---|
| 36 | 2 | 1 |
| 38 | 5 | 1 |
| 39 | 6 | 2 |
queried user activity
| day_of_week | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|
| week_of_year | |||||||
| 36 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 37 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 38 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
| 39 | 0 | 0 | 0 | 0 | 0 | 2 | 0 |
pivoted user activity table
| week_of_year | day_of_week | count |
|---|---|---|
| 36 | 1 | 0 |
| 36 | 2 | 0 |
| … | … | … |
| 36 | 7 | 0 |
| 37 | 1 | 0 |
| … | … | … |
| 37 | 7 | 0 |
| 38 | 1 | 0 |
| … | … | … |
| 38 | 5 | 0 |
| … | … | … |
| 39 | 6 | 0 |
| 39 | 7 | 0 |
empty activity table
the inverse scope law of function names
The longer the scope of a function, the shorter its name should be. Functions that are called locally from a few nearby places should have long descriptive names, and the longest function names should be given to those functions that are called from just one place.
“longer scope”: more general part of a code
function arguments
- do not use more than three (Martin, 2009)
- what if you’d need more?
- wrap it into an object
- do not use flags
- “Flag arguments are ugly […] loudly proclaiming that this function does more than one thing (Martin, 2009).”
def build_empty_dataframe(start, end, cols):
records = []
for woy in range(start, end + 1):
for dow in range(1, 8):
records.append([woy, dow, 0])
return pd.DataFrame.from_records(
records, columns=cols
)
def query_progress(as_percentage: bool):
res = con.execute(progress_query)
progress = res.fetchone()[0]
if as_percentage:
return progress * 100
else:
return progress
function as interface
DataFrame.to_csv(
path_or_buf=None, *,
sep=',',
na_rep='',
float_format=None,
columns=None,
header=True,
index=True,
index_label=None,
mode='w',
encoding=None,
compression='infer',
quoting=None,
quotechar='"',
lineterminator=None,
chunksize=None,
date_format=None,
doublequote=True,
escapechar=None,
decimal='.',
errors='strict',
storage_options=None
)
no side effects
Side effects are lies. Your function promises to do one thing, but it also does other hidden things (Martin, 2009).
– Robert C. Martin
an operation, function or expression is said to have a side effect if it modifies some state variable value(s) outside its local environment, that is to say has an observable effect besides returning a value (the main effect) to the invoker of the operation (Wikipedia contributors, 2022).
side effect example
class Something:
foo = 0
def increase(self, by):
self.foo += by
def decrease(self, by):
self.foo -= by
something = Something()
print(something.foo) # 0
something.increase(2)
print(something.foo) # 2
smth = {"foo": 0}
def increase(what, by):
return what + by
def decrease(what, by):
return what - by
print(smth["foo"]) # 0
increase(smth["foo"], 2) # 2
print(smth["foo"]) # 0
smth["foo"] = increase(smth["foo"], 2)
print(smth["foo"]) # 2
prefer exceptions to returning error codes
- in unix-like systems processes still return 0 if the execution was successful
- but returning error codes in functions are discouraged
FileNotFoundExceptionis better thanERRCODE_26375- meaningful name
- no mental mapping
- exception handling syntactically more readable
comments
this section is based on the book Clean Code (chapter 4) by Robert C. Martin (Martin, 2009)
with own examples
separating comments
# connect to the database
con = sqlite3.connect("data.db")
# query activity data
data = pd.read_sql(activity_query, con)
# create empty dataframe
records = []
for woy in range(36, 40):
for dow in range(1, 8):
records.append([woy, dow, 0])
empty = pd.DataFrame.from_records(records, columns=["week_of_year", "day_of_week", "count"])
# combine empty and sparse dataframe
data = (
pd.concat([data, empty])
.drop_duplicates(subset=["week_of_year", "day_of_week"], keep="first")
.sort_values(["week_of_year", "day_of_week"])
.reset_index(drop=True)
)
# pivot dataframe
activity = pd.pivot(
data, index=["week_of_year"], columns=["day_of_week"], values=["count"]
).values
separated functions
def create_empty_dataframe(start_week, end_week):
records = []
for woy in range(start_week, end_week+1):
for dow in range(1, 8):
records.append([woy, dow, 0])
return pd.DataFrame.from_records(
records, columns=["week_of_year", "day_of_week", "count"]
)
def fill_empty_with_activities(empty, activities):
return (
pd.concat([activities, empty])
.drop_duplicates(subset=["week_of_year", "day_of_week"], keep="first")
.sort_values(["week_of_year", "day_of_week"])
.reset_index(drop=True)
)
def pivot_dataframe(data):
return pd.pivot(
data, index=["week_of_year"], columns=["day_of_week"], values=["count"]
).values
these functions do one thing
separated functions - usage
con = sqlite3.connect("data.db")
activities = pd.read_sql(activity_query, con)
empty = create_empty_dataframe(36, 39)
data = fill_empty_with_activities(emty, activities)
activities_matrix = pivot_dataframe(data)
only the comments remained, which can be read as a prose
more bad comments
journal comment
# 2024-10-17 -- Add idiomatic coding examples
# 2024-10-18 -- Add meaningful names section
the version tracker keeps better journal
noise comments
# creates an empty dataframe
def create_empty_dataframe(start_week, end_week):
# ...
don’t write something that is already in the code
closing brace comments
for (i = 0; i < 10; i++) {
console.log(i);
} // for
modern editors can find (end display) the block endings
Apollo 11 - Colossus 2A
P21VSAVE DLOAD # SAVE CURRENT BASE VECTOR
TAT
STOVL P21TIME # ..TIME
RATT1
STOVL P21BASER # ..POS B-29 OR B-27
VATT1
STORE P21BASEV # ..VEL B-7 OR B-5
ABVAL SL*
0,2
STOVL P21VEL # /VEL/ FOR N73 DSP
RATT
UNIT DOT
VATT # U(R).(V)
DDV ASIN # U(R).U(V)
P21VEL
STORE P21GAM # SIN-1 U(R).U(V), -90 TO +90
SXA,2 SET
P21ORIG # 0 = EARTH 2 = MOON
P21FLAG
source, GitHub repository, more about the Apollo Guidance Computer: (Slavin, 2015)
good comments
legal comments
some open source licences should be included to the beginning of the files
informative comments
import re
timestamp = "2024-10-22 09:30:42"
# matches for timestamps in the format of: YYYY-MM-DD HH:MM:SS
re.match(r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}", timestamp)
TODOs – good or bad?
# TODO: this allows invalid month, day, hour, minute and second values
re.match(r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}", timestamp)
editors can collect TODO (and FIXME)
annotations and warn about them
documentation
def fizzbuzz(i: int) -> str:
"""Fizzbuzz is a game for children to teach them about division.
It is also a common coding practice.
Parameters
----------
i : int
Input number tested against division by 3, 5 and 15.
Returns
-------
str
`Fizz` if input divisible by 3, `Buzz` if divisible by 5 and `FizzBuzz` if both.
"""
result = ""
if i % 15 == 0:
result += "FizzBuzz"
elif i % 3 == 0:
result += "Fizz"
elif i % 5 == 0:
result += "Buzz"
else:
result = str(i)
return result
doctest
def fizzbuzz(i: int) -> str:
"""
>>> fizzbuzz(3)
'Fizz'
>>> fizzbuzz(5)
'Buzz'
>>> fizzbuzz(12)
'Fizz'
>>> fizzbuzz(15)
'FizzBuzz'
>>> fizzbuzz(17)
'17'
"""
result = ""
if i % 15 == 0:
result += "FizzBuzz"
elif i % 3 == 0:
result += "Fizz"
elif i % 5 == 0:
result += "Buzz"
else:
result = str(i)
return result
python -m doctest -v fizzbuzz_doctest.py
Trying:
fizzbuzz(3)
Expecting:
'Fizz'
ok
Trying:
fizzbuzz(5)
Expecting:
'Buzz'
ok
Trying:
fizzbuzz(15)
Expecting:
'FizzBuzz'
ok
Trying:
fizzbuzz(17)
Expecting:
'17'
ok
4 passed.
Test passed.
summary
- bear in mind that the code is more often read than written
- make your intentions clear
- use the language properly as it’s intended
- write idiomatic code
- follow the (style) guides, and best practices
- hierarchy
clean code / meaningful names (Martin, 2009)
- use intention-revealing names
- pick one word per concept
- avoid disinformation
- make meaningful distinctions
- don’t use names like
doSomething()anddoSomething2()
- don’t use names like
- use pronounceable names
- use searchable names
- “The longer the scope of a function, the shorter its name should be.” – Robert C. Martin
- avoid encodings
intNumberOfDays = 0
- don’t pun or use humor, be professional
names for classes, functions
- a class is a model / blueprint of something
- the name should be a noun
- e.g.,
User,Activity
- e.g.,
- an object is an instance of a class
- still a noun
- e.g.,
user = User()
- a function does something
- the name should contain a verb
- in imperative
- e.g.,
aggregate_activity activity_aggregation
clean code / functions (Martin, 2009)
- “Functions should hardly ever be 20 lines long” (Martin, 2009)
- shorter functions are easier to understand
- do one thing (single responsibility principle)
- “The longer the scope of a function, the shorter its name should be.” – Robert C. Martin
- avoid using more than three arguments
- avoid using flags
- no side effects
- prefer exceptions to returning error codes
clean code / comments (Martin, 2009)
avoid
- journal comments
- noise comments
- writing something that is already in the code
- closing brace comments
- separating comments
however, comments can be used if they help to understand the code
- informative comments, that explain what is happening
- math, physics, domain-specific things
- (API) documentation with examples
references
Bhargav, N. (2024). Hungarian notation. https://www.baeldung.com/cs/hungarian-notation .
Fowler, M. (2018). Refactoring: Improving the design of existing code. Addison-Wesley Professional.
Martin, R. C. (2009). Clean code: A handbook of agile software craftsmanship. Pearson Education.
Sharif, B., & Maletic, J. I. (2010). An eye tracking study on camelcase and under_score identifier styles. 2010 IEEE 18th International Conference on Program Comprehension, 196–205.
Slavin, T. (2015). Coding the apollo guidance computer (AGC). https://kidscodecs.com/coding-the-apollo-guidance-computer-agc/ .
Stemmler, K. (2019). How to learn software design and architecture. https://khalilstemmler.com/articles/software-design-architecture/full-stack-software-design .
Wikipedia contributors. (2022). Side effect (computer science) — Wikipedia, the free encyclopedia. https://en.wikipedia.org/w/index.php?title=Side_effect_(computer_science)&oldid=1063806709.