silvio.peroni@unibo.it
Keywords
tree
Gabriel Garcia Marquez
Buendia family
These lecture notes introduce a new data structure for defining hierarchical relations: the tree. The historic hero introduced in these notes is Gabriel García Márquez, who was one of the most notable writers in Spanish of the 20th century.
Copyright notice. This work is licensed under a Creative Commons Attribution 4.0 International License. You are free to share (i.e. copy and redistribute the material in any medium or format) and adapt (e.g. remix, transform, and build upon the material) for any purpose, even commercially, under the following terms: attribution, i.e. you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. The licensor cannot revoke these freedoms as long as you follow the license terms.
Gabriel García Márquez (shown in Figure 1) was a Colombian novelist, and he was one the most notable writers in Spanish of the 20th century. He won the Nobel Prize for Literature in 1982. While, as a journalist, he wrote several non-fictional works, he is mainly known for his novels, such as Cien años de soledad (One Hundred Years of Solitude in English) and El amor en los tiempos del cólera (Love in the Time of Cholera in English).
Several of his works mention the fictional town of Macondo, which is the main setting of one of his book, i.e. One Hundred Years of Solitude, which narrates the story of the Buendia family. In particular, the story introduces the life of seven different generation of people of the same family, and follows its adventures and, often, misfortunes.
In the Italian edition of this novel, published by Mondadori [1], at the very beginning of the book is depicted a family tree of the various generations. While it provides a few spoilers about some future events, it is actually very useful to follow the story of the family, since several Buendia people often share the same name, and it is not so easy to follow the story if it is not clear to whom the narrator is actually referring to.
Gracia Marquez is the second Latin American writer we have used in this course in order to introduce specific topics related to the Computational Thinking and Computer Science domains. In fact, trees (such as family trees) are particular kinds of structures that allow one to define a hierarchy of values that can be used for further tasks or computations. Actually, it is not the first time we have adopted it, even if they were not mentioned explicitly. In particular, in the lecture of the dynamic programming algorithms, we have already used a tree for showing the execution of the recursive calls to the Fibonacci algorithm implemented by means of the divide and conquer approach.
In addition, as Digital Humanists, you will have to deal (or are dealing) with trees, in particular when we consider the Markup Language domains. Languages such as the one provided by the Text Encoding Initiative (TEI) and the Hypertext Markup Language (HTML) are peculiar examplars of XML-like languages which allows one to construct hierarchies of markup elements for structurally and semantically annotating a text.
A tree is a data structure that simulates a hierarchical tree, composed by a set of nodes related to each other by a particular hierarchical parent-child relation. As shown in Figure 2, this data structure usually follows a top-down presentation order, contrarily to the actual real organisation of the plant. In fact, the originating (or starting) node is called root node and is usually placed at the very top of the tree, while the terminating nodes, called leaf nodes, are actually placed at the very bottom. Taking into consideration a central node of a tree, e.g. the yellow one highlighted in Figure 2, we can name precisely all the other nodes surrounding it, in particular:
the parent node of a given one is a node directly connected to another one when moving close to the root node;
a child node of a given one is a node directly connected to another one when moving away from the root node;
a sibling node of a given one is a node that shares the same parent node;
an ancestor node of a given one is a node that is reachable by following the path from child to parent repeatedly;
a descendant node of a given one is a node that is reachable by following the path from parent to a child repeatedly.
In ThyMopani (well, Python...) a tree is basically a set of nodes linked together according to parent-child relationships. While there is no built-in implementation of the tree data structure in Python, there are some external packages that are very well suited for the task. Among those, one of the most famous one is anytree.
This package allows one to create a node of a tree by means of the constructor def Node(name, parent=None)
. Thus, each node must specify a name (as a string) and a parent – if the parent is not specified then it will assume None as value and it is implicitly defined as the root node of a tree. In anytree, the constructor is the main mechanism for defining a tree by simply stating the parent relations during the definition of new nodes.
from anytree import Node root = Node("html") head = Node("head", root) title = Node("title", head) body = Node("body", root) paragraph_1 = Node("p", body)
paragraph_2 = Node("p", body)
It is worth mentioning that the siblings of a certain parent are actually ordered among them. The order is defined by the order of insertion as a child of that particular parent. For instance, in the example in Listing 1, paragraph_1
comes before paragraph_2
in the siblings of the node body
.
The advantages of using anytree are that it makes available a lot of facilities for accessing various information associated to a node. In particular:
<node>.children
returns a tuple listing all the children of a node;
<node>.parent
returns the parent of a node;
<node>.descendants
returns a tuple listing all the descendants of a node (including its children);
<node>.ancestors
returns a tuple listing all the ancestors of a node (including its parent);
<node>.siblings
returns a tuple listing all the siblings of a node;
<node>.root
returns the root node of the tree where a node is contained.
It is worth mentioning that the variable defining the children and the parent of a node can be updated by assigning to them a particular collection (e.g. a list or a tuple) of nodes. For instance, we can invert the ordering of the two paragraphs defined as children of the node body
in Listing 1 as follows:
body.children = (paragraph_2, paragraph_1)
Define your family tree by using the Python anytree package.
Look at the documentation of the anytree package online, and see if there is some mechanism to visualize the tree in a readable form. If so, apply such visualization to your family tree defined in the previous point.
Garcia Marquez, G. (1967). Cent'anni di solitudine. Mondadori, edizione 2017. ISBN: 978-8804675983