Abstract

These lecture notes introduce the notion of algorithm and pseudocode, so as to provide the initial tools for instructing a computer in executing a particular task. The historic hero introduced in these notes is Ada Lovelace, considered the first computer programmer. Her work in translating and commenting a scholarly paper describing Babbage's Analytical Engine has been one of the most important milestones of the Computer Science discipline.

Copyright notice. This work is licensed under a Creative Commons Attribution 4.0 International License. You are free to share (i.e. copy and redistribute the material in any medium or format) and adapt (e.g. remix, transform, and build upon the material) for any purpose, even commercially, under the following terms: attribution, i.e. you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. The licensor cannot revoke these freedoms as long as you follow the license terms.

Our first algorithm

The pseudocode we use for describing all the algorithms of this course is called ThyMopani, meaning your tree: thy is the archaic version of your, while mopani is a particular African tree with butterfly-shaped leaves. This language is enough expressive for allowing one to describe any kind of algorithm that can be implemented in any programming language. In this lecture notes, we see only the basic constructs, while we will address more specific and complex features in the following lectures.

The goal of today's lecture is to develop our first algorithm. It can be described informally by the following natural language text: taking in input three different strings, i.e. two words and a bibliographic entry of a published paper, return 2 if both the words are contained in the bibliographic entry, 1 if at least one of the words is contained in the bibliographic entry, and 0 otherwise.

An incomplete version

In ThyMopani, each algorithm is defined by using the keyword def (which stands for define) followed by the name of the algorithm and a comma-separated list of input parameters between round brackets, e.g. def contains_word(first_word, second_word, bibliographic_entry). This definition is then followed by : and all the instructions of the algorithm must be specified in the following lines, as an indented block (preferably using 4 spaces). This is illustrated in Listing 1. It is worth mentioning that the name of the algorithm, as well as all the parameters, cannot contain space characters and must always start with a letter – e.g. this_is_my_parameter is correct, while 1_parameter is not.

def contains_word(first_word, second_word, bibliographic_entry):
    ...
    ...
    ...

Listing 1. The definition of an algorithm, with its input parameter, and some comments identifying where to put the various instruction of such algorithm – one per line, indented of 4 space characters.

In this first version of the algorithm, we would like to introduce only some basic constructs of ThyMopani. To this end, we provide only a partial solution in this subsection, which will be finalised in the following subsections. In particular, we want just to say that if the first input word is contained in the bibliographic entry, then the number 1 is returned, otherwise 0 is returned. This partial version of the algorithm is introduced in Listing 2.

def contains_word(first_word, second_word, bibliographic_entry):
    if first_word in bibliographic_entry: 
        return 1
    else:
        return 0

Listing 2. An incomplete version of the algorithm, that is used to introduce some basic constructs of ThyMopani.

In this partial version, there are already specified some important constructs of ThyMopani. The first one is the if-else conditional block. This kind of block allows one to execute a particular instruction if a condition is true (the if statement), while an alternative set of instructions is executed instead if the condition specified is false (the else statement). The instructions to execute in one case or the other are specified in indented sub-blocks (again 4 additional spaces). As already introduced in Listing 2, every time we have to introduce a new block of instructions, we need to use : after the statement of interest, as shown in Listing 3.

if <condition>:
    ...
    ...
else:
    ...
    ...

Listing 3. The generic structure of an if-else conditional block.

The condition specified in the if statement shown in Listing 2 allows one to check if a certain string is contained in another one by means of the command in. In particular, <string1> in <string2> would be true if the value <string1> is contained in <string2>, where a string is a particular type of value that records a sequence of characters, and it is usually defined by using the quotes. For instance, "Peroni", "Osborne", and "Peroni, S., Osborne, F., Di Iorio, A., Nuzzolese, A. G., Poggi, F., Vitali, F., Motta, E. (2017). Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles. PeerJ Computer Science 3: e132. e2513. DOI: https://doi.org/10.7717/peerj-cs.132" are all strings. Note that <string1> and <string2> are just placeholders for strings: we can use directly strings, e.g. "Peroni" in "Peroni beer", or variables referring to strings, as shown in Listing 2 – where a variable is a symbolic name that contains some information referred to as a value (e.g. first_word). For instance, any input value is, in fact, a particular kind of variable. As defined previously, all the input parameters of the algorithm are expected to refer to strings.

The last construct of the partial algorithm introduced in this sub-section is the return statements. It is defined by specifying the token return followed by the value (or the variable containing a value) that must be returned. The execution of a return statement finishes the whole execution of an algorithm – thus, all the instructions that follow that statement are not processed anymore. In the example in Listing 2, two different numbers are returned, depending on which branch of the if-else block is actually executed. In particular, the algorithm returns 1 if the condition of the if statement is true, while it returns 0 otherwise. In ThyMopani, any number is defined by writing it down as it is – e.g. 42 and -42 for positive/negative integers, 1.625 and -1.625 for positive/negative decimals. Note that strings and numbers are distinct kinds of objects – e.g. the string "42" and the number 42 (without the quotes) are not defining the same value at all.

Complex boolean statements

The original text of the algorithm, introduced in Section 4, explicitly mentions the simultaneous truth of two conditions for returning 2: return 2 if both the words are contained in the bibliographic entry. Of course, this can be handled by means of a hierarchy of if-else blocks, as shown in Listing 4.

if first_word in bibliographic_entry:
    if second_word in bibliographic_entry:
        return 2
    else:
        return 1
else:
    if second_word in bibliographic_entry:
        if first_word in bibliographic_entry:
            return 2
        else:
            return 1
    else:
        return 0

Listing 4. A hierarchy of if-else blocks for describing the three possible return values of the algorithm.

However, the readability of the previous example is rather difficult, since it repeats several times the same conditions, even if they have been specified in a different order. Thus, ThyMopani makes available some operations for assessing compositions of multiple boolean values, and for deriving boolean values from number and string comparisons. A boolean type (or, simply, boolean, named after George Boole, who was a great logician of the 19^th century) can be assigned to one out of two distinct and disjoint values, True and False. For instance, the condition first_word in bibliographic_entry returns a particular boolean: True if the word is indeed contained in the bibliographic entry, False otherwise. In algorithms (and in any programming language), boolean values are used for organising the execution flow of conditional blocks.

Sometimes it is useful to combine somehow two distinct boolean values in order to simplify the organisation of the conditional blocks. This can be done by using specific operators that apply to one (<operator> <B1>) or two boolean values (<B1> <operator> <B2>), and return a new boolean value. These operators are called logical not (not in ThyMopani, which applies to one boolean value only), logical and (and, between two boolean values), and logical or (or, between two boolean values). They are logical operators since all of them derive from the logic Boole proposed in his works on Boolean algebra. Their use is summarised in Table 1, where the truth table of the application of such operators is shown. In particular, given two input boolean values, B1 and B2, the table shows the result of all their possible combinations according to the specific operator. Thus, for instance, in the example in Listing 4, we could return 2 if both the strings are contained in the bibliographic reference, expressing this constraint in one condition only, i.e. first_word in bibliographic_entry and second_word in bibliographic_entry.

B1	B2	not B1	B1 and B2	B1 or B2
True	True	False	True	True
True	False	False	False	True
False	True	True	False	True
False	False	True	False	False

Table 1. The truth table of all the boolean operations.

Round brackets can be used for grouping boolean operations, e.g. (True and False) or False applies the and operation first, and the result is used as the first value of the or operation – given False as result. In case no brackets are used, the application order is the following: first, all the not operation are applied, then all the or operations, and finally the remaining and operations – for instance, True and not False or False returns True since it is interpreted as True and ((not False) or False).

In addition to the aforementioned boolean operations, it is also possible to use string comparisons for obtaining boolean values. Table 2 shows all the comparisons that one can apply on two strings, i.e. <S1> <operator> <S2>. In this case, the operators are those typically used numerical comparison, i.e.:

<, less than;
<=, less than or equal to;
>, greater than;
>= greater than or equal to;
==, equal to;
!=, different from;
in, included in;
not in, not included in.

S1	S2	S1 < S2	S1 <= S2	S1 > S2	S1 >= S2	S1 == S2	S1 != S2	S1 in S2	S1 not in S2
"Alice"	"Bob"	True	True	False	False	False	True	False	True
"Alice"	"Alice"	False	True	False	True	True	False	True	False

Table 2. The truth table of all string comparisons.

In the case of strings, a string S1 is less than another string S2 if the former one precedes the latter one according to a pure alphabetic order. Of course, the alphabetic order is used also for assessing when a string is greater than another one.

Note that similar operators (excluding in) can be used also for comparing numbers, as shown in Table 3. In this cases, the common mathematical numeric comparisons hold.

N1	N2	N1 < N2	N1 <= N2	N1 > N2	N1 >= N2	N1 == N2	N1 != N2
3	4	True	True	False	False	False	True
4	4	False	True	False	True	True	False

Table 3. The truth table of all the arithmetic comparisons.

Thus, we can reuse these boolean operations in order to rewrite the if-else blocks shown in Listing 4 in a more readable way. The result is shown in Listing 5.

if first_word in bibligraphic_entry and second_word in bibliographic_entry:
    return 2
else:
    if first_word in bibliographic_entry or second_word in bibliographic_entry:
        return 1
    else:
        return 0

Listing 5. A hierarchy of if-else blocks shown in Listing 4 rewritten according to the boolean operations presented in this section.

Conditional statements with multiple branches

While in the previous subsections we have improved the readability of the if-else blocks, ThyMopani allows us to do even better. First of all, in the two if statements in Listing 5, we ask to evaluate the same sub-conditions (i.e. first_word in bibliographic_entry and second_word in bibliographic_entry) twice. This can be easily avoided by defining new variables. A variable are defined by specifying its name (without spaces), followed by an = and the value to associate to it, i.e. <variable_name> = <variable_value>. The value can be specified directly (e.g. a number) or indirectly by using other existing variables, or even complex operations.

In our example, we could create two variables, called contains_first_word and contained_second_word, assigned to the boolean returned by the aforementioned string comparisons, i.e. first_word in bibliographic_entry and second_word in bibliographic_entry respectively. In that way, we can reuse such variables in the two if statements, as shown in Listing 6.

if contains_first_word and contains_second_word:
    return 2
else:
    if contains_first_word or contains_second_word:
        return 1
    else:
        return 0

Listing 6. The if-else blocks introduced in Listing 5 where the conditions in the if statements are specified by means of two variables.

In addition to that, we can improve even further the readability of the code by collapsing occurrences of else statements when these contain an if statement as their first instruction. In this case, both the else-if pair can be safely replaced by an elif (i.e. else if) statement, which specifies the same condition used in the if statement. Thus, the code in Listing 6 can be rewritten as shown in Listing 7.

if contains_first_word and contains_second_word:
    return 2
elif contains_first_word or contains_second_word:
    return 1
else:
    return 0

Listing 7. The if-else blocks introduced in Listing 6 collapsed my means of an elif statement.

Final algorithm

In this lecture we have seen some initial constructs that ThyMopani makes available for developing an algorithm, in particular: algorithm definition with input parameters, variables, conditional statements (i.e. if, elif, and else), string, numeric, and boolean values, as well as boolean operations and string and numeric comparisons. All these constructs enabled us to define our algorithm, which is finally introduced in Listing 8.

def contains_word(first_word, second_word, bibliographic_entry):
    contains_first_word = first_word in bibliographic_entry 
    contains_second_word = second_word in bibliographic_entry 
    
    if contains_first_word and contains_second_word: 
        return 2
    elif contains_first_word or contains_second_word: 
        return 1
    else: 
        return 0

Listing 8. The final algorithm developed.

Algorithms

Abstract

Historic hero: Ada Lovelace

Algorithms and programmers

Pseudocode

Our first algorithm

An incomplete version

Complex boolean statements

Conditional statements with multiple branches

Final algorithm

Exercises

References