automatization

Gergő Pintér, PhD

gergo.pinter@uni-corvinus.hu

what to automatize?

everything

more precisely, repetitve tasks | scripting: writing relatively short and simple code to automatize an otherwise manual process

in a software development context

  • style guide compliance
  • code smell finding
  • code quality measurement
  • review
  • building
  • testing
  • deployment

linting

  • a linter in modern editors behaves like a spell checker in a word processor
    • gives immediate feedback on syntax errors, styling issues or bad practices
  • can detect some code smells
  • traditionally linters were developed for languages, then linter plugins for editors
    • so language support in editors were not evident
    • Language Server Protocol (LSP) was developed (originally at Microsoft), providing a common interface for linters

auto formatting

  • there are automatic code formatters for more and more languages
    • that can reformat the source code to align with the style guide
  • usually triggered by saving the file
  • usually configurable to align with organization / project specific rules
  • some examples:

well configured editor

  • can help keeping the feedback loop fast
    • auto formatter
    • linter
    • running test
  • decrease cost
    • immediate feedback
    • less refactoring (later), less review finding
    • developers can focus on non-automatable tasks

importance of personal preferences!

version control systems

  • scrum development team
  • multiple task on the sprint backlog
  • developers start to work on different tasks
    • the time to complete a task varies
  • work items need to be merged to the common code base / repository
  • version control systems are used to solve this issue

file sharing issue

read the same file
make changes
Arthur writes first
Ford overwrites with his version

the figures are based on Figure 2.2 of the TortoiseSVN documentation

lock-modify-unlock solution

lock before read
cannot read while locked
writes and unlock
Ford locks and reads

one developer can edit a file at the same time – not very effective

the figures are based on Figure 2.3 of the TortoiseSVN documentation

copy-modify-merge solution

read the same file
make changes
Arthur writes first
Ford cannot write due to the change

the figures are based on Figure 2.4 of the TortoiseSVN documentation

copy-modify-merge solution #2

compare the latest to his own
merged the changes
publish merged
changes shared

example: git, mercurial

the figures are based on Figure 2.5 of the TortoiseSVN documentation

centralized vs. distributed version control system

centralized

example: subversion

distributed

example: git, mercurial

the figures are based on Version control concepts and best practices - by Michael Ernst [1]

feature branching

  • copy-modify-merge version tracking gave a viable solution for parallel development
    • but separating the “workspace” is still beneficial
  • each developed feature has its own branch, which is merged to the mainline after completion

when to make a commit?

  1. when you completed a unit of work
  2. when you have changes you may want to undo

branching strategies

  • branching is more than just separating workspace, work-in-progress code from released
    • also for managing stable (released) versions
    • and bugfixing though multiple versions

continuous integration (CI)

Continuous Integration is a software development practice where each member of a team merges their changes into a codebase together with their colleagues changes at least daily.

– Martin Fowler [6]

  • emerged from extreme programming
  • considered an agile approach
  • gives immediate feedback
    • the integration (merging) will fail if two branches are not compatible
    • and build the integrated software
  • also gives opportunity to do testing on the built software…

continuous integration

continuous integration environment

build script

  • traditionally called build script
  • responsible not only for building the software
  • but also for running tests, generating reports
    • code coverage
  • and even for packaging the software

example: build script of the course website

scheduled build

nightly build

  • scheduled build during night time
  • typically includes a smoke test
  • building a the latest version of a software, on a daily basis
  • originally scheduled to night time because for large software a full build (with all tests) could take hours

continuous deployment

  • continuous integration ensures everyone integrates their code to the mainline [6]
  • “Continuous Deployment means the product is automatically released to production whenever it passes all the automated tests in the deployment pipeline.” – Martin Fowler [6]

continuous deployment environment

  • extension of a continuous integration environment
  • deployment is another stage in the build script
  • same triggers as in a CI environment (not just the scheduler)

blue–green deployment [7]

  • two servers are maintained (“blue” and “green”)
    • expensive
  • at a given time, only one server is handling public request
  • the other can be accessed only from a private network
  • changes applied to the non-live server and verified
  • when verified, the non-live server is swapped with the live server

shadow deployment

  • two servers are maintained (“live” and “shadow”)
  • for testing the performance and stability requirements
    • on success, the release can be deployed to the live server as well
  • specialized strategy, complex and (relatively) expensive to set up

canary deployment

  • deployment in an incremental fashion
  • starts with a small number of users
  • and continues until 100% is reached
  • allows to test updates in live environment
    • on small groups of users
    • before deploying to many users
    • may involve telemetry

A/B testing is more of a testing approach than a deployment technique, but it works similarly to canary deployment. It involves reviewing two versions of updates in small set of users to identify which version perform better. [8]

devops

  • software development + IT operations
    • collaboration
  • agile mindset, set of principles [9]
    • automation of the SDLC
    • collaboration and communication
    • continuous improvement
    • focus on user needs with short feedback loops
  • relies on automatization, CI and CD
  • to build, test and release better software
    • frequently, reliably, rapidly

further reading: 11 DevOps Principles and Practices to Master: Pro Advice - by Fernando Doglio

what tools to use?

  • CI and CD became a fundamental part of software development
    • got integrated into services like GitHub, GitLab, BitBucket, JetBrains Space
  • some solutions:
    • Jenkins
      • open source, self hosted
    • GitHub Actions
    • GitLab Pipelines
      • integrated to code hosting, free options
    • Travis CI
      • free for open source projects
    • Circle CI
      • free options
Jenkins
GitHub Actions
GitLab Pipelines

automatized review

  • using CI environment
  • do static code analysis
    • analyzing the code without execution
    • searching for syntax errors, styling issues, bad practices or code smells
  • run test suite
  • generate review report from the findings

should not replace human reviewing

just decrease the work by automatizing trivial tasks

automatized review – example

  • CI services integrated into the code hosting / developer platforms
  • code changes can ba annotated by automatized review findings
    • usually at a pull request level
  • a bit slower feedback than running static analysis or test locally

vulnerability alerts

  • Common Vulnerabilities and Exposures (CVE)
    • a dictionary of common names (i.e., CVE Identifiers) for publicly known information security vulnerabilities [10]
    • Apple’s “goto fail” issue is officially called CVE-2014-1266
  • GitHub Dependabot
    • uses package manager
    • e.g., cargo (Rust), npm (JS), nuget (C#), maven (Java), poetry (Python)
    • checks dependencies for vulnerabilities
[tool.poetry.dependencies]
python = "^3.12"
numpy = "^1.26.3"
pandas = "^2.2"
geopandas = "^1.0"
networkx = "^3.2.1"
osmnx = "^1.6.0"
matplotlib = "^3.8.2"
seaborn = "^0.13.0"
contextily = "^1.3.0"
opencv-python = "^4.9.0"
pyaml = "^23.9.7"
pyogrio = "^0.7"
pyarrow = "^15.0.0"
scipy = "^1.12.0"
haversine = "^2.8.1"
mapclassify = "^2.6.1"
openpyxl = "^3.1.2"
ecomplexity = "^0.5.2"
structlog = "^24.1.0"
h3 = "^3.7.7"
pandarallel = "^1.6.5"
jinja2 = "^3.1.4"
tabulate = "^0.9.0"

Python dependencies managed by poetry

dependabot example

remote: Resolving deltas: 100% (5/5), completed with 4 local objects.
remote: 
remote: GitHub found 1 vulnerability on pintergreg/software-engineering's default branch (1 high). To find out more, visit:
remote:      https://github.com/pintergreg/software-engineering/security/dependabot/1

command line warning after git push

dependencies

  • choose carefully the software packages / components your software will depend on
  • use well maintained software modules
  • unmaintained modules have potential vulnerabilities
  • aim for loose coupling regarding the dependency
    • makes it easier to replace if needed
Dependency by Randall Munroe | CC BY-NC 2.5

daily work

  • select a task from backlog
  • read and understand it
  • create a feature branch

  • write code, possibly using TDD
  • local unit testing, checks
  • open a pull request (initiate merge)

  • pull requests used to manage review
  • trigger for CI to run automated tests, static code analysis, generate reports, etc.
  • while your work is reviewed, start working on another task / review other’s work

  • if everything is fine, the task is done
  • eventually the change will be in production at the end users
  • as a part of a release
    • a release can group together multiple changes
    • then, scrum start a new iteration, or kanban continues as always
  • using some kind of deployment strategy

release versioning

  • a software release is identified by a version number
  • often seen as an arbitrary number

pre-releases

  • alpha: incomplete feature-wise, external release is uncommon for proprietary software
    • whitebox testing
  • beta: the software is feature-complete but contains several known or unknown bugs
    • blackbox testing
  • rc: release candidate, final touches
    • highest level testing

  • odd number for development (4.1), even for stable (4.2)
  • Chromium: 131.0.6778.69
  • after GNOME 3.38, the “3.” was dropped and GNOME 40 vas released
    • Java 1.6, 1.7, 1.8, 8, 9, 10…
  • Linux 5.19, 6.0
    • “So, as is hopefully clear to everybody, the major version number change is more about me running out of fingers and toes than it is about any big fundamental changes.” – Linux Torvalds
  • since version 3, TeX has used an idiosyncratic version numbering system [11]
    • where updates have been indicated by adding an extra digit at the end of the decimal, so that the version number asymptotically approaches π
    • last version is 3.141592653 (released in 2021)

semantic versioning

  1. major version when you make incompatible API changes
    • a way of communicating changes
  2. minor version when you add functionality in a backward compatible manner
  3. patch version when you make backward compatible bug fixes

additional labels for pre-release and build metadata are available as extensions to the major.minor.patch format | from semver.org

calendar versioning

format examples:

  • YYYY.MINOR.PATCH
    • micro is used instead of patch
  • YYYY.MM.MINOR.PATCH

CalVer is a versioning convention based on your project’s release calendar, instead of arbitrary numbers.” | calver.org

ZeroVer: 0-based versioning

“Your software’s major version should never exceed the first and most important number in computing: zero.” | 0ver.org

  • e.g.: 0.4.1
  • popular among open source software projects
    • some may reach 1.0.0 eventually

semver: “If your software is being used in production, it should probably already be 1.0.0.”

Fibonacci releases

KDE / Plasma 6.1 series release schedule
version type release date delta
6.1.0 Release Tue 2024-06-18 0
6.1.1 Bugfix Release Tue 2024-06-25 1
6.1.2 Bugfix Release Tue 2024-07-02 1
6.1.3 Bugfix Release Tue 2024-07-16 2
6.1.4 Bugfix Release Tue 2024-08-06 3
6.1.5 Bugfix Release Tue 2024-09-10 5

interruption

interruption

  • the greatest “enemy” of a developer is interruption
  • the code is one thing, the logic behind it is another
    • takes time to understand
  • context switching is costly
    • switching between tasks
  • that is why it is advisable to define small tasks during the sprint planning
    • 1–4 hours, but ideally closer to 1
    • preventing interruption
    • a programmer probably get one uninterrupted 2-hour session in a day [12]

the cost of interruption

  • according to a study, the average lost time per major interruption is 23 minutes [13]
    • for developers, it could be worse
    • according to another study it is at least 15 minutes [12]
  • “getting back to the exact state of mind you were at right before an interruption is nearly impossible” [14]
  • interruptions can be planned and unplanned
© Ash Lamb used with the author’s permission

source: The Cost of Interruption for Software Developers – by Steven To [14]

planned and unplanned interruptions

unplanned

  • someone asks about something or to do something
    • usually a small task
      • informal review, advice, etc.
  • mitigation
    • wear headphones (in open offices)
    • notify in advance

planned

  • meetings, including standup
  • standup is usually the first thing in a workday, not to divide the work time until lunch
  • a wrongly placed meeting can be even worse than an unplanned interruption
    • you have to keep in mind that you have a meeting, cannot start anyting serious
  • mitigation
    • schedule small, easy tasks before meeting

source: The Cost of Interruption for Software Developers – by Steven To [14]

techniques to minimize context switching

  • time blocking
    • divide workday into blocks
  • time batching
    • do similar tasks in a batch
  • prioritize tasks
  • tackle the biggest task first in the morning
  • turn off notifications
  • adopt asynchronous communication
    • e-mail, documentation, ADR
ideal, very bad, much better schedule

references

[1]
M. Ernst, “Version control concepts and best practices.” https://homes.cs.washington.edu/~mernst/advice/version-control.html , Sep-2012.
[2]
J. McCreary, “When to make a git commit.” https://dev.to/gonedark/when-to-make-a-git-commit , 11-Jan-2017.
[3]
V. Driessen, “A successful git branching model.” https://nvie.com/posts/a-successful-git-branching-model , 05-Jan-2010.
[4]
J. Judin, “A succesful git branching model considered harmful.” https://barro.github.io/2016/02/a-succesful-git-branching-model-considered-harmful , 07-Feb-2016.
[5]
S. Shipp, “War of the git flows.” https://dev.to/scottshipp/war-of-the-git-flows-3ec2 , 10-Sep-2019.
[6]
M. Fowler, “Continuous integration.” https://martinfowler.com/articles/continuousIntegration.html , 18-Jan-2024.
[7]
Wikipedia contributors, “Blue–green deployment — Wikipedia, the free encyclopedia.” https://en.wikipedia.org/w/index.php?title=Blue%E2%80%93green_deployment&oldid=1249842339, 2024.
[8]
W. Kazim, “What is software deployment? Process and best practices.” https://learn.g2.com/software-deployment , 31-May-2023.
[9]
GitLab, “4 must-know DevOps principles.” https://about.gitlab.com/blog/2022/02/11/4-must-know-devops-principles , 11-Feb-2022.
[10]
Wikipedia contributors, “Common vulnerabilities and exposures — Wikipedia, the free encyclopedia.” https://en.wikipedia.org/w/index.php?title=Common_Vulnerabilities_and_Exposures&oldid=1256072917, 2024.
[11]
Wikipedia contributors, “TeX — Wikipedia, the free encyclopedia.” https://en.wikipedia.org/w/index.php?title=TeX&oldid=1253226188, 2024.
[12]
C. Parnin, “Programmer, interrupted,” in 2013 IEEE symposium on visual languages and human centric computing, 2013, pp. 171–172.
[13]
G. Mark, D. Gudith, and U. Klocke, “The cost of interrupted work: More speed and stress,” in Proceedings of the SIGCHI conference on human factors in computing systems, 2008, pp. 107–110.
[14]
S. To, “The cost of interruption for software developers.” https://www.brightdevelopers.com/the-cost-of-interruption-for-software-developers , 03-May-2018.
[15]
N. Pande, “The high price of context switching for developers & ways to avoid it.” https://pacohq.com/blog/guide/the-high-price-of-context-switching-for-developers/ , 23-Apr-2021.