Skip to contents

This Vignette explain the data.frame with additional classes citations.network and internal.dependencies. This type of data.frame is the edgelist of a Citations Network of functions or files within programming project(s).

Overall Considerations : about the citations.network data.frame

codexplor::construct_corpus protocols add a citations.network class of data.frame to the main corpus.list.

This citations.network data.frame is an edgelist1 that symbolize the programming project as a document-to-document citations network : a network of documents, citing each others.

codexplor offers several levels-of-depth, e.g., the nodes of a citations.network data.frame could be the functions or the programming files of the corpus.

A citations network have theoretical network properties, not covered here (e.g., a citations network is a directed network).

Network properties

Hereafter, some of the properties of 2 citations.network df of the corpus.list : the files.network and functions.network df.

Name files.network functions.network
Nodes Programming files Functions
Links Start from a programming file that calls a function and point to the programming file where the function is defined Start from a function that call another function and point to this 2nd function
Indegrees °← How many files of the project(s) are calling the function(s) defined in this file °← How many functions of the project(s) are calling this function
Outdegrees °→ Number of files of the project(s) where the functions used by this file are defined °→ How many functions of the project(s) are used by this function
Automentioning (codexplor default is to filter out autolinks within files) Recursivity

See a dataviz’ of a files.network in the short example of documents network


Technical details

Cascading matching. codexplor::construct_corpus perform a cascading matching :

  1. The names of the exposed functions names are extracted from files where these functions are defined, e.g., have found ‘Pier’, ‘pol’ and ‘jak’ function names in 3 separate files.
  2. The functions that calls these functions are identified, e.g., a function is calling 'Pier(', 'pol(' and 'jak(' and the function is linked to these 3 functions.
  3. Similarly, a network of files is constructed, e.g., the code of a programming file is calling 'Pier(', 'pol(' and 'jak(' and this file is linked to the files where these functions are defined.

Function call that are non-supported during 2nd matches

Some way of writing a program are not fully supported during the 2nd matches.

Python instantiated methods. If we take this Python code sample hereafter, the 1st match is realized by construct_corpus, but codexplor can’t match the instantiated Python methods, i.e. obj.method().

#  Consider the Python code herafter 
class A:
    def method(self):
        return "My custom method"
 # construct_corpus return 'method' as a 1stly matched function name

obj = A()
obj.method() # Return : 'My custom method'

# construct_corpus don't identify the 2nd match because of the prefix ("obj.")

Filter undesired match

Junk nodes needs to be excluded from the returned networks, such as the lines where you use the name of another function in a message, e.g. message("my_func() is deprecated. You should use this_function() instead").

construct_corpus offer the exclude_quoted_content parameter, in order to exclude quoted text from the matches and avoid this false positive case.

  construct_corpus("~", languages = "R", exclude_quoted_content = TRUE) 
  

Be aware that all the quoted text will be avoided during the search for functions names, if you’ve included some - fancy - way of call a function as a quoted character chain.