Skip to contents

This function craft a corpus, according to the default settings. It will return the corpus with a citations network of internal dependancies and print some message

Usage

explor_project(folders = NULL, repos = NULL, languages = "R", head = 5, ...)

Arguments

folders

character A string or list representing the path(s) of local folders path to read.

repos

character A string or list representing the name(s) of github repos (e.g., 'tidyverse/stringr').

languages

character. Default = "R". A character vector specifying the programming language(s) to include in the corpus.

head

integer. Default = 5. Number of lines to print.

...

Parameters passed to construct_corpus(). These parameters are

  • characters values, in order to add a prefix and a suffix to the pattern searched (e.g., suffix_for_2nd_matches) or changing the colnames (e.g., file_path_from_colname).

  • double values, e.g., n_char_to_add_suffix parameter (minimum number of characters to add the suffix).

  • logical values, e.g., filter_egolink_within_a_file (default = TRUE) and exclude_quoted_content from the research (default = FALSE)

Value

A list of 5 dataframe : 2 of class corpus.lines, 2 corpus.nodelist and 1 citations.network (symbolizing the edgelist of a document-to-document citations network within a programming project)

from

character citations.network - The local file path or GitHub URL that call a function.

to

character citations.network - The local file path or constructed GitHub URL where the function called is defined.

function

character citations.network - The name of the function matched on a line.

content_matched

character citations.network - The full content matched with the 2nd matches, in order to verify and craft a new regex.

line_number

character citations.network & corpus.lines - The line number of the 2nd match (citation.network) or associated with a line (corpus.lines).

file_path

character corpus.lines & corpus.nodelist - The local file path or constructed GitHub URL, same values as the from & to columns of the citations.network df.

content

character corpus.lines - The content from a line.

matches

character corpus.lines (specifically the codes data.frame)

  • The matched text during the 1st matches (full of NA if there is no match or if they are filtered out, the default).

Examples

# Example with url from github
corpus <- explor_project(folders =  "~" )
#> Error in sub(re, "", x, perl = TRUE): input string 8 is invalid UTF-8
# Return a list of df
# (from the file where a function is call => to the file were defined)