clean code

Gergő Pintér, PhD

gergo.pinter@uni-corvinus.hu

software design and architecture stack

based on Khalil Stemmel’s figure [1]

hierarchy in style guides

not just style guides, also best practices

write idiomatic code

  • a prog. language implements a prog. paradigm
  • a paradigm defines a certain “way” of writing code
    • using different abstractions / building blocks
    • promoting a given concept
  • some languages implements multiple paradigms
  • and languages have their own way of doing things
    • languages have pros and cons for a given problem

just as in the case of natural languages, you ought to use a language properly

write idiomatic code

for (i = 0; i < 10; i++) {
    console.log(i);
}
[...Array(10).keys()].forEach(i => {
    console.log(i);
});
i = 0
while i < 10:
    print(i)
    i += 1
for i in range(10):
    print(i)
for i in 0..9 do
   puts i
end
(0..9).each do |i|
    puts i
end
(0..9).each {|i| puts i}

clean code

Clean Code: A Handbook of Agile Software Craftsmanship

by Robert C. Martin (2009) [2]

meaningful names

this section is based on the book Clean Code (chapter 2) by Robert C. Martin [2]

with own examples

use intention-revealing names

int d; // elapsed time in days

the definition is only available ad the declaration

int elapsedTimeInDays;

the definition is available at every usage

multi-word names

camelCase

int elapsedTimeInDays;
  • C (local variable)
  • Java (variable, method)

UpperCamelCase (PascalCase)

public class DataCollector {}
  • Java (class)
  • Rust (Type, Enum)

snake_case

elapsed_time_in_days = 17
  • Python
  • Rust (variable, function)

a study states, camelCase is faster to type but snake_case is faster to read [3]

read the style guide

avoid disinformation

Do not refer to a grouping of accounts as an accountList unless it’s actually a List [2].

better to use accounts, it does not depend on the collection name

inconsistent spelling is also disinformation

disinformative names would be the use of lower-case L or uppercase O [2]

  • they can look almost like the one and zero, respectively – use the right font
  • PEP8 (Python style guide) forbids to use them

make meaningful distinctions

It is not sufficient to add number series or noise words, even though the compiler is satisfied. If names must be different, then they should also mean something different [2].

def calculate_distance(data: pd.DataFrame) -> pd.Series:
    # do something
def calculate_distance2(data: pd.DataFrame) -> pd.Series:
    # do something else
def calculate_eucledian_distance(data: pd.DataFrame) -> pd.Series:
    # ...
def calculate_levenshtein_distance(data: pd.DataFrame) -> pd.Series:
    # ...

make meaningful distinctions / noise words

Noise words are another meaningless distinction. Imagine that you have a Product class. If you have another called ProductInfo or ProductData, you have made the names different without making them mean anything different [2].

use pronounceable names

If you can’t pronounce it, you can’t discuss it without sounding like an idiot [2].

  • Should etid be an integer?
  • Should elapsed_time_in_days be an integer?

could be especially important for non-native speakers as some words are more difficult to pronounce

use searchable names

Single-letter names can ONLY be used as local variables inside short methods. The length of a name should correspond to the size of its scope [2].

it’s OK to do this:

for i in range(10):
    print(i)

it’s NOT OK in a large scope:

int d; // elapsed time in days

names for classes, functions

  • a class is a model / blueprint of something
  • the name should be a noun
    • e.g., User, Activity
  • an object is an instance of a class
    • still a noun
    • e.g., user = User()
  • a function does something
  • the name should contain a verb
    • in imperative
    • e.g., aggregate_activity
    • activity_aggregation

avoid encodings

with modern IDEs it is pointless to put type or role markers into names

Hungarian notation

  • invented by Charles Simonyi at Microsoft
  • adding a prefix to a name that gives information about type, length, or scope
def fnFactorial(iNum):
    if iNum == 1:
        return iNum
    else:
        return iNum * fnFactorial(iNum - 1)

source: [4]

interface IShapeArea // I is also a prefix
{
  void area(); 
}
interface ShapeArea 
{
  void area(); 
}

avoid mental mapping

Readers shouldn’t have to mentally translate your names into other names they already know [2].

don’t pun or use humor

  • no inside jokes
  • no colloquialisms or slang
  • be objective and professional

Say what you mean. Mean what you say [2].

pick one word per concept

it’s confusing to have fetch , retrieve, and get as equivalent methods of different classes [2]

it also helps to search for the term

add meaningful context

Imagine that you have variables named firstName, lastName, street, houseNumber, city, state, and zipcode. Taken together it’s pretty clear that they form an address. But what if you just saw the state variable being used alone in a method? [2]

  • adding a prefix?
    • e.g., addrCity, addrStreet, addrState
  • as notations are discouraged, use an Address class instead to add context

functions

this section is based on the book Clean Code (chapter 3) by Robert C. Martin [2]

with own examples

functions should be as small as possible

Functions should hardly ever be 20 lines long [2]

  • shorter functions are easier to understand

do one thing (single responsibility principle)

import sqlite3
import pandas as pd

con = sqlite3.connect("data.db")
data = pd.read_sql(activity_query, con)

records = []
for woy in range(36, 40):
    for dow in range(1, 8):
        records.append([woy, dow, 0])
empty = pd.DataFrame.from_records(
    records, columns=["week_of_year", "day_of_week", "count"]
)
data = (
    pd.concat([data, empty])
    .drop_duplicates(subset=["week_of_year", "day_of_week"], keep="first")
    .sort_values(["week_of_year", "day_of_week"])
    .reset_index(drop=True)
)
activity = pd.pivot(
    data, index=["week_of_year"], columns=["day_of_week"], values=["count"]
).values
res = con.execute(progress_query)
progress = res.fetchone()[0]
SELECT
    CAST(
        strftime('%W', timestamp) 
        AS INTEGER
    ) AS week_of_year,
    CAST(
        strftime('%u', timestamp)
        AS INTEGER
    ) AS day_of_week,
    count(*) AS count
FROM activity
WHERE
    user_id = 42 AND
    week_of_year > 35 AND
    week_of_year < 40
GROUP BY
    week_of_year,
    day_of_week;
SELECT
    lesson / 50.0 AS progress
FROM activity
WHERE
    user_id = 42 AND
    result = 'success'
ORDER BY lesson DESC
LIMIT 1;

debug tables

queried user activity
week_of_year day_of_week count
36 2 1
38 5 1
39 6 2
pivoted user activity table
day_of_week 1 2 3 4 5 6 7
week_of_year
36 0 1 0 0 0 0 0
37 0 0 0 0 0 0 0
38 0 0 0 0 1 0 0
39 0 0 0 0 0 2 0
empty activity table
week_of_year day_of_week count
36 1 0
36 2 0
36 7 0
37 1 0
37 7 0
38 1 0
38 5 0
39 6 0
39 7 0

the inverse scope law of function names

The longer the scope of a function, the shorter its name should be. Functions that are called locally from a few nearby places should have long descriptive names, and the longest function names should be given to those functions that are called from just one place.

Robert C. Martin

“longer scope”: more general part of a code

function arguments

  • do not use more than three [2]
  • what if you’d need more?
    • wrap it into an object
  • do not use flags
    • “Flag arguments are ugly […] loudly proclaiming that this function does more than one thing [2].”
def build_empty_dataframe(start, end, cols):
    records = []
    for woy in range(start, end + 1):
        for dow in range(1, 8):
            records.append([woy, dow, 0])
    return pd.DataFrame.from_records(
        records, columns=cols
    )
def query_progress(as_percentage: bool):
    res = con.execute(progress_query)
    progress = res.fetchone()[0]

    if as_percentage:
        return progress * 100
    else:
        return progress

function as interface

DataFrame.to_csv(
    path_or_buf=None, *,
    sep=',',
    na_rep='',
    float_format=None,
    columns=None,
    header=True,
    index=True,
    index_label=None,
    mode='w',
    encoding=None,
    compression='infer',
    quoting=None,
    quotechar='"',
    lineterminator=None,
    chunksize=None,
    date_format=None,
    doublequote=True,
    escapechar=None,
    decimal='.',
    errors='strict',
    storage_options=None
)
Libreoffice Calc CSV settings dialog

no side effects

Side effects are lies. Your function promises to do one thing, but it also does other hidden things [2].

– Robert C. Martin

an operation, function or expression is said to have a side effect if it modifies some state variable value(s) outside its local environment, that is to say has an observable effect besides returning a value (the main effect) to the invoker of the operation [5].

side effect example

class Something:
    foo = 0
    
    def increase(self, by):
        self.foo += by
    
    def decrease(self, by):
        self.foo -= by
    
something = Something()
print(something.foo)  # 0
something.increase(2)
print(something.foo)  # 2
smth = {"foo": 0}

def increase(what, by):
    return what + by

def decrease(what, by):
    return what - by

print(smth["foo"])  # 0
increase(smth["foo"], 2)  # 2
print(smth["foo"])  # 0
smth["foo"] = increase(smth["foo"], 2)
print(smth["foo"])  # 2

prefer exceptions to returning error codes

  • in unix-like systems processes still return 0 if the execution was successful
  • but returning error codes in functions are discouraged
  • FileNotFoundException is better than ERRCODE_26375
    • meaningful name
    • no mental mapping
    • exception handling syntactically more readable

comments

this section is based on the book Clean Code (chapter 4) by Robert C. Martin [2]

with own examples

separating comments

# connect to the database
con = sqlite3.connect("data.db")
# query activity data
data = pd.read_sql(activity_query, con)
# create empty dataframe
records = []
for woy in range(36, 40):
    for dow in range(1, 8):
        records.append([woy, dow, 0])
empty = pd.DataFrame.from_records(records, columns=["week_of_year", "day_of_week", "count"])
# combine empty and sparse dataframe
data = (
    pd.concat([data, empty])
    .drop_duplicates(subset=["week_of_year", "day_of_week"], keep="first")
    .sort_values(["week_of_year", "day_of_week"])
    .reset_index(drop=True)
)
# pivot dataframe
activity = pd.pivot(
    data, index=["week_of_year"], columns=["day_of_week"], values=["count"]
).values

separated functions

def create_empty_dataframe(start_week, end_week):
    records = []
    for woy in range(start_week, end_week+1):
        for dow in range(1, 8):
            records.append([woy, dow, 0])
    return pd.DataFrame.from_records(
        records, columns=["week_of_year", "day_of_week", "count"]
    )

def fill_empty_with_activities(empty, activities):
    return (
        pd.concat([activities, empty])
        .drop_duplicates(subset=["week_of_year", "day_of_week"], keep="first")
        .sort_values(["week_of_year", "day_of_week"])
        .reset_index(drop=True)
    )

def pivot_dataframe(data):
    return pd.pivot(
        data, index=["week_of_year"], columns=["day_of_week"], values=["count"]
    ).values

these functions do one thing

separated functions - usage

con = sqlite3.connect("data.db")

activities = pd.read_sql(activity_query, con)

empty = create_empty_dataframe(36, 39)

data = fill_empty_with_activities(emty, activities)

activities_matrix = pivot_dataframe(data)

only the comments remained, which can be read as a prose

more bad comments

journal comment

# 2024-10-17 -- Add idiomatic coding examples 
# 2024-10-18 -- Add meaningful names section 

the version tracker keeps better journal

noise comments

# creates an empty dataframe
def create_empty_dataframe(start_week, end_week):
    # ...

don’t write something that is already in the code

closing brace comments

for (i = 0; i < 10; i++) {
    console.log(i);
} // for

modern editors can find (end display) the block endings

by Oliver Widder (Geek and Poke) CC BY 3.0

Apollo 11 - Colossus 2A

P21VSAVE    DLOAD           # SAVE CURRENT BASE VECTOR
            TAT
        STOVL   P21TIME     # ..TIME
            RATT1
        STOVL   P21BASER    # ..POS B-29 OR B-27
            VATT1
        STORE   P21BASEV    # ..VEL B-7  OR B-5
        ABVAL   SL*
            0,2
        STOVL   P21VEL      # /VEL/ FOR N73 DSP
            RATT
        UNIT    DOT
            VATT        # U(R).(V)
        DDV ASIN        # U(R).U(V)
            P21VEL
        STORE   P21GAM      # SIN-1 U(R).U(V), -90 TO +90
        SXA,2   SET
            P21ORIG     # 0 = EARTH  2 = MOON
            P21FLAG

source, GitHub repository, more about the Apollo Guidance Computer: [6]

good comments

legal comments

some open source licences should be included to the beginning of the files

informative comments

import re

timestamp = "2024-10-22 09:30:42"
# matches for timestamps in the format of: YYYY-MM-DD HH:MM:SS
re.match(r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}", timestamp)

TODOs – good or bad?

# TODO: this allows invalid month, day, hour, minute and second values
re.match(r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}", timestamp)

editors can collect TODO (and FIXME) annotations and warn about them

documentation

def fizzbuzz(i: int) -> str:
    """Fizzbuzz is a game for children to teach them about division.
    It is also a common coding practice.
    
    Parameters
    ----------
    i : int
        Input number tested against division by 3, 5 and 15.
    
    Returns
    -------
    str
        `Fizz` if input divisible by 3, `Buzz` if divisible by 5 and `FizzBuzz` if both.
    """
    result = ""
    if i % 15 == 0:
        result += "FizzBuzz"
    elif i % 3 == 0:
        result += "Fizz"
    elif i % 5 == 0:
        result += "Buzz"
    else:
        result = str(i)
    return result

doctest

def fizzbuzz(i: int) -> str:
    """
    >>> fizzbuzz(3)
    'Fizz'
    >>> fizzbuzz(5)
    'Buzz'
    >>> fizzbuzz(12)
    'Fizz'
    >>> fizzbuzz(15)
    'FizzBuzz'
    >>> fizzbuzz(17)
    '17'
    """
    result = ""
    if i % 15 == 0:
        result += "FizzBuzz"
    elif i % 3 == 0:
        result += "Fizz"
    elif i % 5 == 0:
        result += "Buzz"
    else:
        result = str(i)
    return result

references

[1]
K. Stemmler, “How to learn software design and architecture.” https://khalilstemmler.com/articles/software-design-architecture/full-stack-software-design , 28-Sep-2019.
[2]
R. C. Martin, Clean code: A handbook of agile software craftsmanship. Pearson Education, 2009.
[3]
B. Sharif and J. I. Maletic, “An eye tracking study on camelcase and under_score identifier styles,” in 2010 IEEE 18th international conference on program comprehension, 2010, pp. 196–205.
[4]
N. Bhargav, “Hungarian notation.” https://www.baeldung.com/cs/hungarian-notation , 18-Mar-2024.
[5]
Wikipedia contributors, “Side effect (computer science) — Wikipedia, the free encyclopedia.” https://en.wikipedia.org/w/index.php?title=Side_effect_(computer_science)&oldid=1063806709, 2022.
[6]
T. Slavin, “Coding the apollo guidance computer (AGC).” https://kidscodecs.com/coding-the-apollo-guidance-computer-agc/ , 03-Aug-2015.