Repositório ISCTE-IUL

Communities are one of the most important structural elements of a network. They frequently inﬂuence network behavior, which makes their identiﬁcation especially useful. As a result, community detection has been a popular topic within network science in recent decades. Even more recently, fostered by an increasing availability of time stamped datasets and a pressing realization that most empiric networks are dynamic in nature, temporal networks have attracted increased attention. The time dimen-sion introduces new network constructs and communities are not immune. A community is no longer just a bunch of ﬁxed nodes tightly clustered, but have a life and activity of itself, shedding and gaining nodes, appearing and disappearing on the network. We believe that these dynamic constructs are still lacking a formal, consensual deﬁnition. In this article we propose a robust taxonomy of life events for communities and a rules based methodology to clearly parse these events.


rks have attr
cted increased attention.The time dimension introduces new network constructs and communities are not immune.A community is no longer just a bunch of fixed nodes tightly clustered, but have a life and activity of itself, shedding and gaining nodes, appearing and disappearing on the network.We believe that these dynamic constructs are still lacking a formal, consensual definition.In this article we propose a robust taxonomy of life events for communities and a rules based methodology to clearly parse these events.

Introduction

In static networks the ground truth of community structure is a surjection from the node set to the community set, describing community node membership.As we extend our study of networks exhibiting community structure into the temporal domain, communities are no longer static.A community that is observed at a given moment may be different later on.Representing the ground truth of such a network as a time-sequence of surjections may faithfully represent the community structure overtime, but does not lead unequivocally to the understanding of its lifecycle.For that we need an accepted taxonomy of lifecycle events, and methods to correlate the changes in community structure to th se events.This is not a new topic as it has been covered in the literature by several authors, but we believe the emerging consensus is problematic.Classifying events is not a closed problem and formalization is lacking.Furthermore, recovering lifecycle events may not be totally possible without information not inherently present in the network topology, which compounds the problem.In this article we present a simple formalized taxonomy and propose a method to track community evolution complemented by external input.

Communities are a challenging network concept.Although in this article we loosely define community as a set of nodes that are more densely connected among themselves than to the rest of the network, the fact is that, given a network, determining if and how many communities exist in that network may not have a single, clear answer [3].Extending these concepts to temporal networks obviously brings in an additional layer of complexity, which, nevertheless, has not deterred many authors from trying.In fact, expanding the ground truth of community structure to include events of a temporal nature is not a new topic as it has been covered in the literature by several authors.Barabási in his book Network Science [1] summarizes current consensus on what these events should be.It documents six elementary events: Growth, Contraction, Merging, Splitting, Birth and Death.

We believe however that this consensus is not without its problems.For instance when defining a community split, where do you draw the line between a split and a contraction?Is losing one node, a split?If not, how many?And how would you classify an event of a community that fully fragments, shedding nodes to multiple communities, which in turn receive nodes from several other communities?In our work we cam to believe that topology alone cannot answer these questions.Depending on subject domain, a community may cease to exist as a separate entity when none of its nodes are seen after a given time T or when a given fraction of its members disappear.Here, the network topology does not shed any light.Examples from the real world abound, just consider the minimum quorum for a shareholder assembly or indicator species in biology.

We also find that it is easier to reason about community events anchored on the community and not on the event.So, for example, a community may experience a fragmentation while other communities in the same network may grow in size by acquiring some of its fragments.

In support of this approach we define three simple top level community events: Birth, Continuation and Death.That is, once born into existence, a community either continues or dies.To determine continuation, we use a similarity measure, adjusted to chance, supported by an external threshold.If the similarity measure between any two communities taken from community sets at T and T + exc

ds the thresh
ld then the oldest continues in the most recent.Note that a community may continue in multiple other communities depending on their similarity.That multiplicity together with time orientation further classifies the continuation event.For example: community A t1 can continue in community C t2 and D t2 (a split), while community C t2 is a continuation of community A t1 and B t1 (a merge).This simplifies the model, catering for the complexity of the multi le types of events that can occur in the clustering of a temporal network, defining events from a cluster point of view, allowing for domain specific external input that further characterizes the community lifecycle.


Related Work

Community events have been defined by several authors [10] and there seems to be an emergent consensus around events like birth, merge, split, growth, expansion, contraction and death.Some authors propose additional events like continuation (i.e.no growth or expansion) and resurgence (communities that appear periodically).As discussed before, we think that these definitions require meta information not intrinsically present in the network topology.

One of the issues that must be addressed when determining lifecycle events is how to compare communities overtime and determine how they are related.There h e distance measures across time steps.An example of t ork into a tensor by adding the time series as an additional axis and recover community structure and activity by tensor decomposition.This approach however forces the tensor size to expand to all nodes that ever existed in the network.Distance measures can save space by comparing only successive network states down to the temporal resolution of the network.Distance measures vary from ratio of shared nodes between successive timesteps [8], the Jaccard Index [7] in [10], [5], [9]:
J(C t i , C t+ j ) = |C t i ∩ C t+ j

|C t i ∪ C t+ j |(1)
and ot
er similarity measures like in [6] similarity
(C t i , C t+ j ) = min |C t i ∩ C t+ j | |C t i | , |C t i ∩ C t+ j | |C t+ j |(2)
that favours communities similarly sized with a high ratio of common nodes.We adopted a similar approach to [5], [9], with slight modifications, while simplifying the concept of community evolution, by anchoring it on the community itself at a given point in time and not on the network.The authors in [9] propose a mechanism to automatically define thresholds without meta-information to determine community events, but it remains to be seen how closely that would follow a judgment based on problem omain expertise.


Recovering Community Events

Clearly defining community events is useful for many reasons, such as the development and testing of dependable temporal community detection algorithms.We need to ensure that the temporal ground truth is not open to mis-interpretation.

Our lifecycle identification approach should be able to address the prob ems associated with the classification of complex events when nodes exit and enter arious communities as well as comprehensively cover most of the events relevant in the various disciplines where tem oral networks play a role.

On this basis we created a multi-level classification scheme, based on the following rule :

Once born into existence, a community either continues or dies.

A community continues in another community if their similarity exceeds an externally supplied threshold.A consequence of this rule is that remains of a community that do not reach the threshold for continuation either become a newly born community or contribute to the expansion of another.

Depending on their multiplicity, continuation events can be further characterized:

- rom the standpoint of a community a multiple continuation event, seen from the past, is subclassified as a split.

-From the standpoint of a community, multiple continuation events seen from the future, is subclassified as a merge.

Expansion and contraction are sub classifications of simple continuation events with net acquisition or loss of nodes.

Communities can die if their nodes are no longer seen on the network (death by dissolution) or because it does not co tinue in any other community (death by fragmentation).A community can experience loss of nodes and fragmentation simultaneously and the proper classi ication would then be dependent on their relative size.

Communities can be born from new nodes (newb rn) or fragments of other communities (regenerated).Both can happen simultaneously and classification follows the largest set.

Communities can also reappear on the network, for example on cyclic events.This is detected as a single continuation bridging a lapse of time longer than the network temporal resolution and can otentially occur on "Newborn", "Regeneration", "Growth", "Contraction", "Split" and "Merge" events.

A full taxonomic tree is depicted in figure 1.The method for community continuation analysis as presented ahead abides by the above categorization.

To compare community similarity many authors use the Jaccard Index (J) [7], as mentioned previously.[10], call it the auto-correlation function and extend it to any time delta.The Jaccard Index varies from 0, when no elements are common between communities, to 1  3 when communities share half of their elements, to 1 when the communities are the same.We propose the usage of a modified Jaccard Index a uth must be known at successive time steps.One of the tenets of community structure is that a random network should not have any communities (this fact is the basis of one of the most popular methods of community detection [4]).However a random flow of nodes across time will result in Jaccard indexes zes at time T and T + , represented by S t , S t+ , a random assignment of node flows results in a null Jaccard Index model J(C t ation, given by:
s t+ i × s t j × min 1, S t+ S t S t+ × (s t+ i + s t j ) − s t+ i × s t j × min 1, S t+ S t(3)
Although discretization could have a significant impact for small networks, we believe it is still valuable to normalize to chance to extract meaning from the index (even if on very large networks the impact of index normalization is marginal), and thus we suggest the usage of an adjusted index J as:
J = J(C t i , C t+ j ) − J(C t i , C t+ ) j 1 − J(C t i , C t+ j ) (4)
where J is negative if J ≤ J varying up to 1 ∝ (J − J), wit domain restricted to 1 ≥ J(C t i , C t+ impact on community pairs with higher relative size compared to the whole network.As the network grows

n size and communities, ra
dom node dispersion leads to a general increase in source commu ity diversity which lowers the null model Jaccard In ex and consequently approximates J to J, or J → J as S → ∞.As an example, if we have |c t 1 | = 200 flowing to |c t+ 1 | = 300 in a network with 500 no es, we have J(|c t 1 |, |c t+ 1 |) = 0.5 and J = 0.34.In a network with 5000 nodes the same flow results in J = 0.49.

The full method has the following steps:

1.A confusion (or contingency) matrix T , with size
|C t | × |C t+ |, is creat d with entries t ij = C t i ∩ C t+ j 2.
A simple Jaccard matrix (J) is created from T and the multiset of community sizes at time T and T + .

3. A null Jaccard matrix ( J) is created from the sequence of comm nity sizes at T and T + .

4. J is created from T and J as describ ld θ is applied as a high-pass binary filter over J resulting in a continuation matrix H that ident The row and column sum of H results in two ectors, respectively S and M that identifies a birth for M i = 0, death for S j = 0, split for S j > 1 and merge for M i > 1.The position in the matrix identifies the respective communities.

7. For every a ij = 1 there is a continuation event between communities C t i and C t+ j that can be simple if their size is equal and, if not, a growth or contraction event, depending on their relative size.8. fo every S j = 0, we have a death by dissolution on community
C t i if |C t i | ≥ 2 × |C t+ j | j=1
t ij or by fragmentation otherwise.9. for every M i = 0 we have a newbor h by regeneration otherwise.

10.The events {"newborn", "regeneration", "growth", "contraction", "split" and "merge"} can be further classified with a reborn attribute as soon as a single continuation results when applying this method to older network observations in a most recent order, i.e. between pairs (C t−n i , C + j ), where n varies from 1 to l where l, stand respectively for the network longevity and temporal resolution.

To illustrate the method consider the clustering sequences C t = C t+ = {20, 20, 20, 20, 20} at time T and T + , where the flow of nodes between communities is given by the following confusion matrix:
T =      
0 0 10 0 5 2 0 0 2 2 5 0 0 5 5 10 0 10 0 0 0 20 0 0 0
     
This results in a simple Jaccard matrix:
J =      
0 0 0.33 0 0.14 0.053 0 0 0.053 0.053 0.14 0 0 0.14 0, 14 0.33 0 0.33 0 0 0 1 0 0 0
     
As all communities have the same size, all elements of the corresponding null Jaccard matrix J are the same ( 1 9 ), and the adjusted Jaccard matrix becomes:
J =     
0 0 0.25 0 0.036 −0.066 0 0 −0.066 −0.066 0.036 0 0 0.036 0.036 0.25 0 0.25 0 0 0 1 0 0 0
     
Let's take θ = 0.2 and we get the continuation matrix:
H =      
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0
     
Resulting in vectors S = {1, 0, 0, 2, 1} and M = {1, 1, 2, 0, 0}.Applying the method above we have continuation events between (C We believe the meaning of J in the context of community lifecy

e requires
xternal subject domain information, although authors in [9] used a dynamic threshold that depends on the actual community structure at every timestep transition: more specifically that threshold is the minimum of the set of maximum j per community of all cross-timestep community node flows, or using our matrix, it is the minimum of the maximum of the J rows and columns entries.This guarantees an increase of continuation events, but, in our view may distort network dynamics, for instance at change points where a lot of communities co

apse in the netw
rk.

Another example from a synthetic network generator can be seen in figure 2. The images show only part of the whole network to highlight a mixed split / merge event.The required matrices for lifecycle determination and the resulting temporal ground truth, as output by the synthetic temporal network generator that implements the method presented in this article (code available on request), is included in figure 3 and table 1.


Conclusion

In this article we present d an approach and simple taxonomy to characterize community events in temporal networks.Temporal net orks are pervasive in many domains and community structure always generates a lot of interest, given its potential applicability.Having a standardization of concepts, terminology and analytic tools cannot but help dvancing this field of study.Although our suggested approach is based on one of many ways of comparing communities, we believe the