Randentropy: a software to measure inequality in random systems

The software Randentropy is designed to estimate inequality in a random system where several individuals interact moving among many communities and producing dependent random quantities of an attribute. The overall inequality is assessed by computing the Random Theil’s Entropy. Firstly, the software estimates a piecewise homogeneous Markov chain by identifying the changing-points and the relative transition probability matrices. Secondly, it estimates the multivariate distribution function of the attribute using a copula function approach and ﬁnally, through a Monte Carlo algorithm, evaluates the expected value of the Random Theil’s Entropy. Possible applications are discussed as related to the ﬁelds of ﬁnance and human mobility.


Introduction
The issue of measuring inequality in a system found extensive treatment in the literature.One interesting approach is based on entropic measures.Starting from the pioneering work by Shannon (1948) on the mathematical theory of communication, the concept of entropy has found a rapid development and diffusion in many scientific communities.Notable examples are statistics (see, e.g.Kullback and Leibler (1951)), statistical mechanics (see, e.g.Jaynes (1957)), economy (see, e.g.Theil (1967)) and ecology (see, e.g.Phillips, Anderson, and Schapire (2006)) just to name a few.Recent efforts have been dedicated mainly to introduce new entropies as the cumulative residual entropy (see, Rao, Chen, Vemuri, and Wang (2004)) or the cumulative past entropy (see, Di Crescenzo and Longobardi (2009)).In the meantime, and mainly motivated by economic problems, the notion of random entropy has emerged in terms of a normalization of a random process.The random entropy shares the same functional form as the classical entropy but is related to a random process (D'Amico and Di Biase (2010)).This more general entropy was called by the author Dynamic Theil Entropy.Nevertheless we refer to it as Random Entropy, to avoid any possible misunderstanding with other dynamic entropies which are expressed as deterministic functions as in Di Crescenzo and Longobardi (2002), Asadi and Zohrevand (2007) and Calì, Longobardi, and Navarro (2020).
The Random Entropy allows to quantify uncertainty in a random system evolving in time and encompasses recent approaches and measures introduced in Curiel and Bishop (2016).In this paper, we consider the general model considered in a previous work D' Amico, Petroni, Regnault, Scocchera, and Storchi (2019) and we present a software that g.damico@unich.it(G.D'Amico); loriano@storchi.org(L.Storchi) https://www.storchi.org/(L.Storchi) ORCID(s): permits the calculation of the inequality in a general system composed by a number of interacting individuals.Any individual moves among several communities in time and according to its membership, and depending on that of the other individuals, produces an attribute.The dynamic of individuals among the communities is described according to a piecewise homogeneous Markov chain which requires the identification of an unknown number of changing-points (i.e.where the Markov chain changes its dynamic).Conditional on the occupancy of the communities, the individuals produce an attribute in quantities expressed by a multivariate probability distribution where the dependence structure is managed by a copula function.Finally, using a Monte Carlo algorithm, we show how to compute the moments of the Random Entropy.
The main innovation brought by this research is the building of the software Randentropy.It contemplates different aspects that were only partially considered in other research papers.Indeed, different studies deal with software and packages related to multi-state models of Markovian type.For example, in Ferguson, Datta, Brock et al. (2012) the authors consider a package for computing marginal and conditional occupation probabilities for Markov and non-Markov multi-state models, including the censoring problem and the use of covariates.In Jackson et al. (2011), multi-state models for panel data observed continuously and generally based on the Markov assumption have been instead considered.The possibility to obtain a time-varying model is considered using piecewise-constant time-dependent covariates.Contrarily to these studies, our software gives different transition probability matrices according to the change-points detection methodology presented in Polansky (2007), which is based only on observations of the Markov process and not on additional covariates.Moreover, once the piecewise homogeneous Markov chain is identified, the software provides sequences of dependent random vectors denoting the ownership of an attribute by the individuals of the system.Thus, the system becomes a multivariate Markov reward process on which the Random Entropy is evaluated.To our knowledge, our software is the sole that computes the Random Entropy and does it in a very general framework that encompasses recent contributions presenting diversity measurement based on (deterministic) entropy where the migration of individuals among the communities is not allowed, see Marcon and Hérault (2015a).Of potential interest is also the use of the software Randentropy to problems approached with the traditional concept of entropy, see e.g.Behrendt, Dimpfl, Peter, and Zimmermann (2019) and Saad and Ruai (2019).The subsequent sections of this paper present the general mathematical model, relevant scenarios of application and the software main characteristics, both the CLI (Command Line Interface) and GUI (Graphical User Interface) are described.

Theory
The main function driving the development of the software we are presenting here (i.e.Randentropy) refers to the computation of a measure of inequality on the distribution of a given attribute among a set of individuals.The quantity of this attribute depends on a discriminatory criterion, according to whom the individual belongs to a given group.Accordingly to the nomenclature mainly derived within the ecology community, but preserving its general validity also in other domains, we denote the set of individuals as a metacommunity that is partitioned in several interacting groups called communities.This description is the same adopted in Marcon and Hérault (2015b).
Let denote the meta-community by  and the number of its members by .Each individual ∈  belongs, at any time ∈ ℕ, to one of different communities that form the meta-community .The variable ( ) with values in = {1, 2, ..., } denotes the community to which the individual belongs to at time .Every time the individual is a member of a given community, it owns a quantity of the personal attribute denoted by ( ).The considered system is stochastic, in the sense that each individual passes through different communities randomly in the course of time and, as a consequence, the personal attributes evolve over time randomly.In this way, the proposed approach is more general as compared to that proposed by Marcon and Hérault (2015b), where the possibility for members to migrate from a community to another is not permitted.
The sequence of the visited communities by any individual ∈ , that is { ( )} ∈ℕ , is assumed to be a realization of a stochastic processes X ∶= ( ( )) ∈ℕ .Thus, the sequences of individual's attribute, that is { ( )} ∈ℕ , evolve randomly too.We will denote, from now on, the stochastic process describing the evolution of individuals' attribute as S ∶= ( ( )) ∈ℕ .The processes X and S evolve jointly, meaning that: the evolution of the process S is driven by the stochastic process X , which controls it.A precise description of this mechanism follows.Firstly, we assume an independence assumption between the dynamics of the individuals.Thus, the community process for every individual will be denoted simply by = ( ), and the reference to specific individual ∈  is dropped.Moreover, we assume that = ( ) is distributed according to a piecewise homogeneous Markov chain (PHMC).The process is a PHMC taking values in the finite set , if it exists a positive number of change-points , a sequence 0 = 0 < ⋯ < = ∞ of increasing times and a sequence (0) P, … , ( ) P of stochastic matrices (such that for any ∈ ℕ, ≤ ), it ensues that: for any ∈ { , … , +1 − 1} and any , ∈ E the following Markov property holds: The symbols 0∶( −1) = ( 0 , … , −1 ) ∈ , (0 ∶ ( − 1)) = ( (0), … , ( − 1)) and { , … , +1 − 1} represents the time interval, enclosed between the ℎ and the + 1 ℎ change-point, where the dynamics at community-level are fixed and described by the transition probability matrix ( ) P = { ( ) } , ∈ .Intuitively, the term piecewise refers to the existence of some points in time where the dynamic change consistently.These times are called change-points.They break up the time-line into several sub-periods within whom the Markov process is homogeneous.
Next step concerns the specification of the processes describing the personal attributes, i.e. S .We consider a metacommunity where the personal attributes of the individuals can be considered to be dependent among each others.The dependence is introduced through the application of a copula function.This strategy is pursued assuming that the marginal distributions of the attributes of the individuals allocated in the same community, and they share a common probability distribution.Formally, let denote the conditional distribution of attribute ( ) knowing the community where, for a given random variable , the symbol ( ) denotes its probability distribution.Now we are in the position to advance the second main assumption stating that: the conditional joint distribution of where is the copula, with dependence parameter .According to the considered copula function, may also be a vector of parameters.
As we are interested in measuring the inequality of the distribution of attributes in the meta-community, we need to introduce a measure of inequality.In particular, the measure of inequality we consider allows the user to face with stochastic processes.The measure is based on the Theil entropy (see Theil (1967)), closely related to the Shannon entropy (see Shannon (1948)).Given a probability distribution the Theil index, (p) of p, is defined as the Kullback-Leibler (KL) divergence (p|u) between p and the uniform distribution u, or equivalently, as the difference between log( ) and the Shannon entropy (p).Precisely, where log .The definition of Theil index has been extended for stochastic processes by D'Amico and Di Biase (2010) and successively applied and further investigated in D'Amico, Di Biase, and Manca (2012) and in D'Amico, Di Biase, and Manca (2014) for an additive decomposition of this index.The random extension of the Theil index is, indeed, introduced.
Let ℎ ( ) be the share of the attribute held by individual ∈  at time ∈ ℕ.It is defined as the proportion of its own attribute ( ) relative to the sum of the attribute over all individuals i.e., The vector of shares of attributes at time , ℎ( ) ∶= ( ℎ ( )) ∈ defines a probability distribution on the set of countries .Note that sh ∶= ( ℎ( ) ∈ℕ ) is a stochastic process that depends on the stochastic processes S , controlled by .We call Random Entropy of personal attributes in the meta-community the stochastic process (sh( )), given by An explicit formula for the expected value of (sh( )) has been provided in D' Amico et al. (2019).Nevertheless, that formula can only be effectively implemented for small sized meta-communities and number of communities.In the contrary case, a Monte Carlo simulation approach can be successfully implemented.The proposed algorithm simulates repeatedly the trajectories of all individuals according to the underlying Markov model, providing the sequence of communities to which each individual belongs in time.Moreover, the personal attributes are simulated by using the copula function with marginal distribution for each individual dependent on the community of membership.The expected value of the Random Entropy can be estimated by averaging, for each time, over all simulated attributes in the metacommunity.
The algorithm (see Algorithm 1) is made of several steps, clearly e are omitting some preliminary tasks, such as: the identification of the number and dislocation in time of the changing points { } =1 ; the corresponding estimation of the transition probability matrices ( ) P = { ( ) } , ∈ ; the cdf's { , ∈ } of attribute depending on the community ; and the identificability of the copula function .Obviously, the software Randentropy is designed to solve all the aforementioned tasks, included the implementation of the Monte Carlo algorithm which represents the very last step of the computation.
For easiness of notation we adopt the following vectorial notation along the Algorithm 1: ( ) denotes the community to which the individual belongs to at time .Thus, (⋅, ⋅) is a matrix whose values are element of .Its i-th row ( , ⋅) provides the meta-community configuration at time , that is the allocation of the individuals at that time among the communities.Instead, the j-th column of the matrix ( (⋅, )) gives the trajectory of the individual in time, that is, the sequence of communities it visited in time; • ( , ) = ℎ ( ) denotes the share of attribute held by individual a time ℎ.Thus, (⋅, ⋅) is a matrix whose values are non-negative real numbers.Its i-th row ( , ⋅) provides the share of the attribute own by the individuals of the meta-community at time ; it represents a probability distribution.The j-th column of the matrix (⋅, ) shows instead the evolution in time of the share of the attribute own by the individual ; • ( ) = (sh( )) denotes the value of the Random Entropy at time in the meta-community.In exact term, it gives the Theil's entropy computed on the probability distribution ( , ⋅) which represents a realization of the Random Entropy in the given simulation; • denotes the horizon time of the simulation.
Algorithm 1 Monte Carlo Simulation of the Random Entropy for = 1 ∶ set (0, ) = ; end for set = 1; 3. set ℎ = −1 while ℎ < ( ∧ ) for c=1:N sample the random variable ∼ )); set = + 1 and continue to 3. end while The result of Algorithm 1 is a sequence of values { (ℎ)}, ℎ = 1, … , .Now, if we execute the cited algorithm times, we can denote by { ( ) (ℎ)}, ℎ = 1, … , the result of the simulation at the l-th repetition.Then, we are able to provide an estimation of the expected value of the Random Entropy by the average value in the simulation, i.e.

Relevant scenarios of application
In this section we provide a short description of two possible domains of application of the model.Certainly, a variety of additional situations falls well within the described theoretical setting.

Financial inequality in an economic area
This application was originally considered by D'Amico, Scocchera, and Storchi (2018b) and D'Amico, Regnault, Scocchera, and Storchi (2018a) and successively in a more comprehensive way in D' Amico et al. (2019).In this framework we have a meta-community that coincides with a given set of countries all belonging to a given Economic area.A possible case is represented by the European Economic Area.Practically, every country receives a note about its financial creditworthiness, which is expressed in term of a sovereign credit rating, see e.g.Trueck and Rachev (2009) and D'Amico, Di Biase, Janssen, and Manca (2017).Credit ratings are measured in an ordinal scale and assigned by the rating agencies.Moody's, Standard & Poor's and Fitch are the three major among others.Each rating class can be seen as a community, in which the countries are allocated at every time.According to the own riskiness (expressed by rating class), each country pays interest rates on its debt.When the interest rates are compared to a benchmark they define the so-called credit spreads.Thus, credit spreads can be seen as the personal attributes held by each country in time.
Empirical analysis has shown that credit spreads of European countries are positively correlated, with the exception of Denmark, Sweden and the United Kingdom.To model this complex correlation structure a copula function can be used accordingly to our framework.Once the credit spreads are obtained, it is possible to compute the vector of attributes at time , ℎ( ) ∶= ( ℎ ( )) ∈ .Finally, the computation of the expected value of (sh( )) gives an effective tool for forecasting the financial inequality in an economic area and its evolution in time.

Human mobility and environmental implications
Another area in which the (sh( )) might be useful is related to the analysis of human mobility data and specific attributes of interest, see e.g.Song, Kotz, Jain, and He (2006) and Krumme, Llorente, Cebrian, Moro et al. (2013).Evidently, it is possible to use Markov chains as a tool to measure patterns of movements of individuals in a given area.Substantially, the global area, in which the totality of individuals (the meta-community) lives, is partitioned into different locations (communities) and the probability of the next visited location is assumed to depend only on the current location and not on the previous ones.As members of a given location, individuals possess a personal attribute that can be of different nature.
For example, it would be possible to consider pollution as a variable depending on the specific location, and to measure with the index (sh( )) the inequality of the distribution of pollution in the global area and how it may evolve in time.Another possible choice, for the personal attribute, can be the level of expenditures, in such a case the Random Entropy could be used for assessing the inequality of expenditures in the area.The latter approach can represent an indeed useful tool to optimize the displacement policies of new markets and stores.

Computational details and applications
The software we are presenting here has been engineered so that the main computational kernel is included in a single python module named randentropymod Storchi (2020).The cited module contains two classes: randentropykernel and changepoint.The two classes are devoted to the Markov reward approach computation, and to the change-point estimation, respectively.The full software bundle is then composed by two Command Line Interfaces (CLIs): randentropy.pyand randentropy_qt.py,and a single Graphical User Interface (GUI) based on PyQt5 Summerfield (2007) (i.e. the Python binding of the cross-platform GUI toolkit).
While the two mentioned CLIs, have been specifically developed to perform separately the Markov reward computation (i.e.randentropy.py)and the change-point estimation (i.e.changepoint.py),the GUI has a wider ability.Indeed, the GUI may be used to perform both the changepoint estimation as well as the Markov reward computation, and clearly also to easily visualize and explore the results obtained.
The full software suite has been developed within the Linux OS environment.However, once the needed packages are downloaded and installed, it should work, without restrictions, also under Mac OS and Windows thanks to the intrinsic portability nature of the Python programming language.The Python packages, in addition to the aforementioned PyQT5, strictly needed to run the code are: Numpy (see Dubois, Hinsen, and Hugunin (1996)) and Scipy (Jones, Oliphant, Peterson et al. (2001)) used to engineered the numerical tasks, matplotlib for the plots and data visualization (see Hunter (2007)).

The randentropykernelclass and related CLI
As already stated, the randentropykernel class is devoted to the computation of the Random Entropy which is based on the Markov model with dependent rewards as described in Section 2. The class is made of several methods as the one to specify the community matrix (i.e.set_community) and the attributes matrix (i.e.set_attributes) which correspond to the matrices (⋅, ⋅) and (⋅, ⋅) used in the algorithm, respectively.There are clearly various methods to tune the computation behavior such as: set the number of Monte Carlo simulation steps (i.e.set_num_of_mc_iterations), or the simulated time period set_simulated_time.Finally, the user has the ability to enable or disable the copula function via the set_usecopula method, and clearly to perform the main computation calling the run_computation method.Once the computation is completed, the user can retrieve all the results: the first and the second-order moments of the Random Entropy using get_entropy and get_entropy_sigma, respectively.
The randentropy.py is the CLI that is naturally bonded to the mentioned class.As can be seen from Figure (1), the user has the possibility to specify two input matrices (i.e. to specify both their locations and names): the first one representing the community matrix, while the second is the Attributes one.The mentioned matrices may be stored both on a MatLab file or on a CSV style one.Evidently, the CLI options reported in Figure ( 1) reflect the cited randentropykernel capabilities.Then, -s allows for the bin width specification, needed to estimate the probability distribution of the attribute given the community membership.Secondly, -t enables the user to specify the simulated period, and -n refers to the number of Monte Carlo iterations.Optionally, the -i flag allows the user to run the simulation after computing the stationary distribution.
It is finally somehow interesting to report here that: in case one wants to perform the simulation using the stationary distribution of the Markov chain = ( ) we need to solve a linear matrix equation = .To solve the given equation one can compute the value of that minimizes the Euclidean 2-norm || − || 2 .This has been done by applying a specific function within Numpy libraries (see Dubois et al. (1996)).

The changepoint class and related CLI
As already stated within the randentropymod module there is also the changepoint class.The cited class, and thus the related CLI, is devoted to detect the position of changepoints, where = 1, 2, 3.In particular, the code finds the positions of the change-points by maximizing the likelihood function of the observed trajectories of the of the members within their communities.At the same time, the Λ test is carried out in order to assess statistically significant differences among the transition probability matrices found.Additional details on this statistical test are available in Polansky (2007) and D 'Amico et al. (2019).
The most relevant methods within the class are needed to specify the transition matrix (i.e.set_community) and the number of change-points to be detected (i.e.set_num_of_cps).
Once the initial settings have been specified the main computation starts using the compute_cps method.Finally, the calculated change-points can be retrieved using the get_cp1_found , get_cp2_found and get_cp3_found respectively for the first, second and third change-point.Once again the CLI options, reported in Figure 2, as expected, reflect the class capabilities.Thus, to run the code, the input transition matrix has to be specified, in terms of a Matlab or a CSV filename, as well as the matrix name within the file (options -m and -M, respectively).The number of change-points to be considered has to be defined as well (i.e. using the -c option), otherwise the code will run assuming a single change-point.Optionally, an output filename, where all the results are written, can be specified using the -o-output-file option.
Finally, we introduced some methods, and clearly the relatives CLI options, that can be used also to distribute the computational burden among several processes, thus CPUs.Indeed, while working with a huge amount of data it can be convenient to specify a range of time within whom the algorithm is carried out, or to use a specific time distance between two change-points.Thus, the user has the ability to define a range of time for the first change-point (the same apply for the others) via the set_cp1_start_stop method.Similarly, using the set_delta_cp method, one can specify the delta time to be considered among the change-points.

Graphical User Interface
All the previously illustrated functionalities, have been integrated also on a GUI (Graphical User Interface).The GUI has been implemented using PyQT5 a comprehensive set of Python bindings for Qt v5 PyQT (2012).While we implemented two different CLIs, to fully cover the various aspects implemented within the randentropymod, the GUI is unique and can be access via the randentropy_qt.pyfile Storchi (2020).The computation starts after choosing an input file, can be both a Matlab as well as a CSV one, containing two matrices.The first matrix has to contain the data of the variable which is supposed to evolve according to a Homogeneous Markov Chain (HMC) (e.g. in the financial application the variable consists on the sovereign credit ratings, see Section 3.1).As a matter of fact, the first matrix is expected to be named "ratings" by default (see Figure 3).The second matrix has to refer to the reward process describing the attribute which is driven by the HMC.In the case of the financial application, as illustrated in Section 3.1, this is the credit spread.As the code directly computes the credit spread starting from the interest rates, the second matrix directly collects the interest rate data.Indeed, by default, this matrix within the file is expected to be named "interest_rates" (see Figure 3).
Once the two matrices have been specified, the user may start the computation: Edit -> Run.The use is prompted with a dialog window, reported in Figure ( 4, where has the ability to specify: the bin width to estimate the empirical distributions (one for each ordered variable of the first matrix), the simulated period and the number of Monte Carlo iterations.Alternatively, the user can flag "Simulation using stationary distribution" to compute the asymptotic values of the Random Theil's Entropy.After pushing on OK button the program will start the computation, and as finished it returns the plot of the Dynamic inequality (Figure ( 5)), that the user has the ability to interact with and to save as a graphical file (i.e.PNG, PDF, PS, and more).In the case reported in Figure ( 7), a single change-point is detected within a range of time spreading between = 70 and = 100.
After confirming the chosen options, the computation starts and the GUI returns the plot of the likelihood function estimated on the community data (see in Figure 8), together with the value of the maximum likelihood function, and the corresponding position of the calculated change-point.Evi-

Testing financial inequality in an economic area
Finally, we will show how the described CLIs and GUI can be used to predict the financially inequality in the European Economic Area according to the theoretical model proposed in D' Amico et al. (2018b,a).In this specific case the meta-community coincides with all the countries within the European Community.Thus, each rating class, as assigned by rating agencies, can be seen as a community, in which the countries are allocated at every time step.Clearly, as also already stated in the previous section, the credit spread represents the personal attributes held by each country.
The results we are here reporting have been obtained using the monthly rating, attributed by the Standard & Poor's agency, to the 26 European countries (UK and Cyprus have been excluded in the current meta-community sample) from January 1998 to December 2016 (see D 'Amico et al. (2018b) for extra details on the data-set we are here considering).
To detect the position of a change-point, within the considered horizon time, we compute the maximum value of the likelihood function considered as a function of the position of the changing point.Finally, we fix the change point as the value that maximizes the likelihood function.In the proposed software one can use both the the changepoint.pyCLI as well as the GUI: The result is reported in Fig. 9, where the likelihood function is computed depending on the position of the change point (measured on the X-axis).The software detects a changepoint at time 158 (the maximum value of the likelihood function).The value 158 corresponds to a change point detected in January 2012.Indeed, at the beginning of 2012 the value of the total credit spread in Europe had a peak of about 10.000 basis points (bp) and this growth was driven by the rise of the securities yield of Greece ( 2   where the forecast period has been set to 36 months, using 1000 Monte Carlo simulations.The final result, reported in Fig. 10, shows a similar trend for both the GUI and CLI, with some clear differences related to the implicit randomness of the Monte Carlo procedure (clearly the user can easily avoid this difference selecting a fixed random seed us-ing the -seed option, or equivalently via the set_use_a_seed method within the randentropykernel class ).The results entail a sharp increase in short-term financial inequality, as measured in term of credit spread, which is expected to persist in the first 10 months of the forecast.Then, the rise is expected to be less pronounced until the reaching of its maximum value around month 20.Immediately afterwards, a slight decrease is expected to be observed.As a final remark it is somehow important to underline that, evidently, a user has the capability of building its own code, to perform the same or similar computations just described, accessing directly the functionalities implemented within the randentropymod Python 3.x module.

Conclusions and perspectives
The Randentropy software allows estimating the inequality in a stochastic system according to the framework based on Random Entropy as developed in D' Amico et al. (2019).The methodology is able to consider dependent behaviours of the individuals and time-varying dynamics, which may be of interest in several applied domains.Possible developments of the research include the possibility to consider semi-Markov models, as done in the SemiMarkov R Pack-age developed by Król and Saint-Pierre (2015), to which a reward scheme based on a copula function should be attached, followed by the evaluation of the Random Entropy according to our software.
Random Entropy evaluation, in the presented general framework, is a new and challenging subject of research and is not available in any software; this renders our investigation an "unicum" in the literature of inequality assessment in stochastic systems.

Figure 1 :
Figure 1: CLI for the Markov reward approach

Figure 3 :
Figure 3: Dialog to specify the input matrices

Figure 4 :
Figure 4: Dialog to specify the the input parameters related to the Monte-Carlo simulation

Figure 6 :
Figure 6: Histogram of the CS empirical distribution / Transition probability matrix

Figure 8 :
Figure 8: Output of the change-point detection algorithm

Figure 9 :
Figure 9: GUI results for the change-point detection , see text for details.

Figure 10 :
Figure 10: Random Entropy.Results obtained using the CLI are reported on the left panel, while the ones obtained using the GUI have been reported on the right panel