Loading [MathJax]/jax/output/SVG/config.js

Abstract

These lecture notes introduce the notion of algorithm and pseudocode, so as to provide the initial tools for instructing a computer in executing a particular task. The historic hero introduced in these notes is Ada Lovelace, considered the first computer programmer. Her work in translating and commenting a scholarly paper describing Babbage's Analytical Engine has been one of the most important milestones of the Computer Science discipline.

Copyright notice. This work is licensed under a Creative Commons Attribution 4.0 International License. You are free to share (i.e. copy and redistribute the material in any medium or format) and adapt (e.g. remix, transform, and build upon the material) for any purpose, even commercially, under the following terms: attribution, i.e. you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. The licensor cannot revoke these freedoms as long as you follow the license terms.

Historic hero: Ada Lovelace

Ada Lovelace (shown in Figure 1) was a daughter of the poet Lord Byron, and English mathematicians who became famous for her work on the Babbage's Analytical Engine. Despite her father's habits, her mother, Anne Isabella Milbanke, strongly promoted Ada's interest in logic and mathematics, even after the death of her father. One of the goals of her mother was to prevent her to incur in the same insanity that characterised her father's life. However, the creativity that was intrinsically tied up on Byron family manifested in a totally unpredictable way.

img/ada.jpg

Figure 1. Portrait of Ada Lovelace. Source: https://en.wikipedia.org/wiki/File:Ada_Lovelace_portrait.jpg.

In 1833, she attended a party organised by Charles Babbage for presenting its Difference Engine. She was so impressed by Babbage's invention that she started a corrispondence with him that spanned 27 years [2]. She was the English translator of the first article about the Analytical Engine, that was written in French by Luigi Federico Menabrea, and that she enriched with several annotations. Among these annotations, there was a description of how to use the Analytical Engine to calculate the Bernoulli numbers [1]. Technically speaking, this was the first computer program ever written (actually the first algorithm of the whole history) and it has been created by Ada without even having the real machine implemented – since the Analytical Engine was just a theoretical machine that was not physically built by Babbage.

However, her vision about the possible uses of the Analytical Engine went even further [2]:

The operating mechanism can even be thrown into action independently of any object to operate upon (although of course no result could then be developed). Again, it might act upon other things besides number, were objects found whose mutual fundamental relations could be expressed by those of the abstract science of operations, and which should be also susceptible of adaptations to the action of the operating notation and mechanism of the engine. Supposing, for instance, that the fundamental relations of pitched sounds in the science of harmony and of musical composition were susceptible of such expression and adaptations, the engine might compose elaborate and scientific pieces of music of any degree of complexity or extent.

That science of operations is a clear reference to a particular field that was clearly named and identified only after several years. In practice, Ada Lovelace was talking about Computer Science one hundred years before its formal introduction.For her work in the field, Ada Lovelace is recognised as the first computer programmer in history.

Algorithms and programmers

Before to introduce the main topic of this lecture note, it would be worth to focus on simple examples we usually face during our daily life. Figure 2 illustrates two examples of step-by-step procedures we have to follow for preparing canapé crackers and for assembling a particular lamp respectively. While the actual goal of the two examples is extremely different, since the first one is a recipe while the other one is a set of instructions for assembling an utensil, the are described in terms of a shared abstract notion: instructions for producing something starting from some initial material we have.

img/recipe-ikea.png

Figure 2. Two pictures depicting a recipe (left) and the instruction for assembling a lamp (right). Left picture by Phil! Gold, source: https://www.flickr.com/photos/phil_g/17282816/. Right picture by Richard Eriksson, source: https://www.flickr.com/photos/sillygwailo/3183183727/.

The word algorithm is a combination of the Latin word algorismus (that is the Latinization of the name Al-Khwarizmi, who was a great mathematician from Persia in the 8th century) and the Greek word arithmos, meaning number. Broadly speaking, we can define an algorithm as an abstraction of a step-by-step procedure that takes something as input and produces some desired output [3]. Each algorithm is written in a specific language which is functional to communicate its instruction to a computer (either human or machine) so as to obtain something by processing some input material.

A computer programmer usually is a person who creates algorithms and specifies them in a computer program according to a particular computer language – thus, the term ​computer is here used for talking about an electronic computer. However, for what concerns the scope of this course, we use the term ​computer programmer to refer to anyone that creates algorithms that can be interpreted by any computer, being it a human or a machine.

Pseudocode

There is no standard language for describing an algorithm in a way that it is immediately understandable by any computer. However, often Computer Scientists rely on pseudocode when they want to describe a particular algorith, Broadly speaking, a pseudocode is an informal language that could be interpreted easily by any computer, even if it is usually used for communicating the steps of a process to humans. While an algorithm described by means of a pseudocode is not runnable by an electronic computer, it constructs are closely tied to the ones that are typically defined, with a formal grammar, in programming languages.

In particular, any algorithm can be expressed in pseudocode and, in principle, that pseudocode can be translated into different programming languages quite easily. The real difference is that, usually, some passages in the pseudocode can be simplified by using even natural language text, while one has to specify clearly every passage if one uses a programming language.

In this lecture notes, as well as in the main part of the whole course, we use a particular pseudocode which is good for being understandable by both humans and machines at the same time. However, the goal of the algorithm we will develop during the first, theoretical, part of the course must be understood primarily by humans. Thus, a good way for checking if an algorithm one has developed (by means of the pseudocode) can be interpretable by a computer is to ask a colleague to execute it starting from a particular input – e.g. by writing down all the passages of the execution on a piece of paper. We discuss this task in detail in the following section.

Our first algorithm

The pseudocode we use for describing all the algorithms of this course is called ThyMopani, meaning your tree: thy is the archaic version of your, while mopani is a particular African tree with butterfly-shaped leaves. This language is enough expressive for allowing one to describe any kind of algorithm that can be implemented in any programming language. In this lecture notes, we see only the basic constructs, while we will address more specific and complex features in the following lectures.

The goal of today's lecture is to develop our first algorithm. It can be described informally by the following natural language text: taking in input three different strings, i.e. two words and a bibliographic entry of a published paper, return 2 if both the words are contained in the bibliographic entry, 1 if at least one of the words is contained in the bibliographic entry, and 0 otherwise.

An incomplete version

In ThyMopani, each algorithm is defined by using the keyword def (which stands for define) followed by the name of the algorithm and a comma-separated list of input parameters between round brackets, e.g. def contains_word(first_word, second_word, bibliographic_entry)​. This definition is then followed by : and all the instructions of the algorithm must be specified in the following lines, as an indented block (preferably using 4 spaces). This is illustrated in Listing 1. It is worth mentioning that the name of the algorithm, as well as all the parameters, cannot contain space characters and must always start with a letter – e.g. this_is_my_parameter is correct, while 1_parameter is not.

​def contains_word(first_word, second_word, bibliographic_entry):
    ...
    ...
    ...
Listing 1. The definition of an algorithm, with its input parameter, and some comments identifying where to put the various instruction of such algorithm – one per line, indented of 4 space characters.

In this first version of the algorithm, we would like to introduce only some basic constructs of ThyMopani. To this end, we provide only a partial solution in this subsection, which will be finalised in the following subsections. In particular, we want just to say that if the first input word is contained in the bibliographic entry, then the number 1 is returned, otherwise 0 is returned. This partial version of the algorithm is introduced in Listing 2.

​def contains_word(first_word, second_word, bibliographic_entry):
if first_word in bibliographic_entry:
return 1
    else:
        return 0
Listing 2. An incomplete version of the algorithm, that is used to introduce some basic constructs of ThyMopani.

In this partial version, there are already specified some important constructs of ThyMopani. The first one is the if-else conditional block. This kind of block allows one to execute a particular instruction if a condition is true (the if statement), while an alternative set of instructions is executed instead if the condition specified is false (the else statement). The instructions to execute in one case or the other are specified in indented sub-blocks (again 4 additional spaces). As already introduced in Listing 2, every time we have to introduce a new block of instructions, we need to use ​: after the statement of interest, as shown in Listing 3.

​if <condition>:
    ...
    ...
else:
    ...
    ...
Listing 3. The generic structure of an if-else conditional block.

The condition specified in the if statement shown in Listing 2 allows one to check if a certain string is contained in another one by means of the command in. In particular, <string1> in <string2> would be true if the value <string1> is contained in <string2>, where a string is a particular type of value that records a sequence of characters, and it is usually defined by using the quotes. For instance, "Peroni", "Osborne", and ​"Peroni, S., Osborne, F., Di Iorio, A., Nuzzolese, A. G., Poggi, F., Vitali, F., Motta, E. (2017). Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles. PeerJ Computer Science 3: e132. e2513. DOI: https://doi.org/10.7717/peerj-cs.132" are all strings. Note that <string1> and <string2> are just placeholders for strings: we can use directly strings, e.g. "Peroni" in "Peroni beer", or variables referring to strings, as shown in Listing 2 – where a variable is a symbolic name that contains some information referred to as a value (e.g. first_word). For instance, any input value is, in fact, a particular kind of variable. As defined previously, all the input parameters of the algorithm are expected to refer to strings.

The last construct of the partial algorithm introduced in this sub-section is the return statements. It is defined by specifying the token return followed by the value (or the variable containing a value) that must be returned. The execution of a return statement finishes the whole execution of an algorithm – thus, all the instructions that follow that statement are not processed anymore. In the example in Listing 2, two different numbers are returned, depending on which branch of the if-else block is actually executed. In particular, the algorithm returns 1 if the condition of the if statement is true, while it returns 0 otherwise. In ThyMopani, any number is defined by writing it down as it is – e.g. ​42 and -42 for positive/negative integers, 1.625 and -1.625 for positive/negative decimals. Note that strings and numbers are distinct kinds of objects – e.g. the string "42" and the number 42 (without the quotes) are not defining the same value at all.

Complex boolean statements

The original text of the algorithm, introduced in Section 4, explicitly mentions the simultaneous truth of two conditions for returning 2: ​return 2 if both the words are contained in the bibliographic entry. Of course, this can be handled by means of a hierarchy of if-else blocks, as shown in Listing 4.

​if first_word in bibliographic_entry:
    if second_word in bibliographic_entry:
        return 2
    else:
        return 1
else:
    if second_word in bibliographic_entry:
if first_word in bibliographic_entry:
return 2
else:
return 1
else:
return 0
Listing 4. A hierarchy of if-else blocks for describing the three possible return values of the algorithm.

However, the readability of the previous example is rather difficult, since it repeats several times the same conditions, even if they have been specified in a different order. Thus, ThyMopani makes available some operations for assessing compositions of multiple boolean values, and for deriving boolean values from number and string comparisons. A boolean type (or, simply, boolean, named after George Boole, who was a great logician of the 19th century) can be assigned to one out of two distinct and disjoint values, True and False. For instance, the condition first_word in bibliographic_entry returns a particular boolean: True if the word is indeed contained in the bibliographic entry, False otherwise. In algorithms (and in any programming language), boolean values are used for organising the execution flow of conditional blocks.

Sometimes it is useful to combine somehow two distinct boolean values in order to simplify the organisation of the conditional blocks. This can be done by using specific operators that apply to one (<operator> <B1>) or two boolean values (<B1> <operator> <B2>), and return a new boolean value. These operators are called logical not (​not in ThyMopani, which applies to one boolean value only), logical and (​and, between two boolean values), and logical or (​or, between two boolean values). They are logical operators since all of them derive from the logic Boole proposed in his works on Boolean algebra. Their use is summarised in Table 1, where the truth table of the application of such operators is shown. In particular, given two input boolean values, B1 and B2, the table shows the result of all their possible combinations according to the specific operator. Thus, for instance, in the example in Listing 4, we could return 2 if both the strings are contained in the bibliographic reference, expressing this constraint in one condition only, i.e. first_word in bibliographic_entry and second_word in bibliographic_entry.

B1 B2 not B1 B1 and B2 B1 or B2

True

True

False

True

True

True

False

False

False

True

False

True

True

False

True

False

False

True

False

False

Table 1. The truth table of all the boolean operations.

Round brackets can be used for grouping boolean operations, e.g. (True and False) or False applies the and operation first, and the result is used as the first value of the or operation – given False as result. In case no brackets are used, the application order is the following: first, all the not operation are applied, then all the or operations, and finally the remaining and operations – for instance, ​True and not False or False returns True since it is interpreted as True and ((not False) or False).

In addition to the aforementioned boolean operations, it is also possible to use string comparisons for obtaining boolean values. Table 2 shows all the comparisons that one can apply on two strings, i.e. <S1> <operator> <S2>. In this case, the operators are those typically used numerical comparison, i.e.:

S1 S2 S1 < S2 S1 <= S2 S1 > S2 S1 >= S2 S1 == S2 S1 != S2 S1 in S2 S1 not in S2

"Alice"

"Bob"

True

True

False

False

False

True

False

True

"Alice"

"Alice"

False

True

False

True

True

False

True

False

Table 2. The truth table of all string comparisons.

In the case of strings, a string S1 is less than another string S2 if the former one precedes the latter one according to a pure alphabetic order. Of course, the alphabetic order is used also for assessing when a string is greater than another one.

Note that similar operators (excluding in) can be used also for comparing numbers, as shown in Table 3. In this cases, the common mathematical numeric comparisons hold. 

N1 N2 N1 < N2 N1 <= N2 N1 > N2 N1 >= N2 N1 == N2 N1 != N2

3

4

True

True

False

False

False

True

4

4

False

True

False

True

True

False
Table 3. The truth table of all the arithmetic comparisons.

Thus, we can reuse these boolean operations in order to rewrite the if-else blocks shown in Listing 4 in a more readable way. The result is shown in Listing 5.

​if first_word in bibligraphic_entry and second_word in bibliographic_entry:
    return 2
else:
    if first_word in bibliographic_entry or second_word in bibliographic_entry:
        return 1
    else:
        return 0
Listing 5. A hierarchy of if-else blocks shown in Listing 4 rewritten according to the boolean operations presented in this section.

Conditional statements with multiple branches

While in the previous subsections we have improved the readability of the if-else blocks, ThyMopani allows us to do even better. First of all, in the two if statements in Listing 5, we ask to evaluate the same sub-conditions (i.e. first_word in bibliographic_entry and second_word in bibliographic_entry) twice. This can be easily avoided by defining new variables. A variable are defined by specifying its name (without spaces), followed by an = and the value to associate to it, i.e. <variable_name> = <variable_value>. The value can be specified directly (e.g. a number) or indirectly by using other existing variables, or even complex operations.

In our example, we could create two variables, called contains_first_word and contained_second_word, assigned to the boolean returned by the aforementioned string comparisons, i.e. first_word in bibliographic_entry and second_word in bibliographic_entry respectively. In that way, we can reuse such variables in the two if statements, as shown in Listing 6.

​if contains_first_word and contains_second_word:
    return 2
else:
    if contains_first_word or contains_second_word:
return 1
else:
return 0
Listing 6. The if-else blocks introduced in Listing 5 where the conditions in the if statements are specified by means of two variables.

In addition to that, we can improve even further the readability of the code by collapsing occurrences of else statements when these contain an if statement as their first instruction. In this case, both the else-if pair can be safely replaced by an elif (i.e. else if) statement, which specifies the same condition used in the if statement. Thus, the code in Listing 6 can be rewritten as shown in Listing 7.

​if contains_first_word and contains_second_word:
    return 2
elif contains_first_word or contains_second_word:
    return 1
else:
    return 0
Listing 7. The if-else blocks introduced in Listing 6 collapsed my means of an elif statement.

Final algorithm

In this lecture we have seen some initial constructs that ThyMopani makes available for developing an algorithm, in particular: algorithm definition with input parameters, variables, conditional statements (i.e. ifelif, and else), string, numeric, and boolean values, as well as boolean operations and string and numeric comparisons. All these constructs enabled us to define our algorithm, which is finally introduced in Listing 8.

def contains_word(first_word, second_word, bibliographic_entry):
contains_first_word = first_word in bibliographic_entry
contains_second_word = second_word in bibliographic_entry

if contains_first_word and contains_second_word:
return 2
elif contains_first_word or contains_second_word:
return 1
else:
return 0
Listing 8. The final algorithm developed.

Exercises

  1. What is the boolean value of not (not True or False and True) or False?

  2. What is the boolean value of "spam" not in "spa span sparql" and not ("egg" > "span")?

  3. What is the result of the execution of the algorithm in Listing 8 using "Peroni""HTML", and "Peroni, S., Osborne, F., Di Iorio, A., Nuzzolese, A. G., Poggi, F., Vitali, F., Motta, E. (2017). Research Articles in Simplified HTML: a Web-first format for HTML-based scholarly articles. PeerJ Computer Science 3: e132. e2513. DOI: https://doi.org/10.7717/peerj-cs.132" as input values?

References