Construct a Citations Network from Data Frame
get_citations_network_from_df.RdGiven a dataframe from the user, the function extracts a network of citations by searching for patterns.
The function will 1st construct a pattern by adding a prefix and a suffix to each text from the pattern_varname column
Then these pattern are searched in the content_varname column, returning a df with "line number" where match have occured
Usage
get_citations_network_from_df(
df,
content_varname = "content",
pattern_varname = "first_match",
prefix_for_regex_from_string = "",
suffix_for_regex_from_string = "",
keep_only_row_without_a_pattern = TRUE,
varname_for_matches = "matches"
)Arguments
- df
A data frame containing the data to be processed.
- content_varname
character, default ="content"A character string specifying the name of the column containing the text to be searched. Default is "content".- pattern_varname,
default =
"first_match"A character string specifying the name of the column containing the patterns that will be matched. Default is "first_match".- prefix_for_regex_from_string
character, default =""A character string to be used as a prefix in the regex pattern.- suffix_for_regex_from_string
character, default =""A character string to be used as a suffix in the regex pattern.- keep_only_row_without_a_pattern
logical, default =TRUEIfTRUE, keeps only rows with an initial entry for constructing the pattern (i.e. lines with a character in thepattern_varnamecolumn of the df passed by the user will be filter out)- varname_for_matches
character, default ="matches"A character string specifying the name of the column of matches in the returned df.
Details
The returned data frame has 5 columns:
row_numberThe row number of the original data frame where the text is matched.
matchesThe text matched by the pattern, e.g., name of a person.
contentThe text content where the pattern was searched, i.e. the column that is identified with
content_varnamefirst_matchThe original pattern searched for (filled with NA if keep_only_row_without_a_pattern is
TRUE)
Examples
if (FALSE) { # \dontrun{
df <- data.frame(content = c("Citation (Bob, 2021)", "Another Bob"), first_match = c("Bob" , NA))
get_citations_network_from_df(df ) # Return only the 2nd line (match 'Bob')
get_citations_network_from_df(df, keep_only_row_without_a_pattern = FALSE)
#will return the lines (matching 'Bob')
} # }