Construct a Citations Network from Data Frame
get_citations_network_from_df.Rd
Given a dataframe
from the user, the function extracts a network of citations by searching for patterns.
The function will 1st construct a pattern by adding a prefix and a suffix to each text from the pattern_varname
column
Then these pattern are searched in the content_varname
column, returning a df with "line number" where match have occured
Usage
get_citations_network_from_df(
df,
content_varname = "content",
pattern_varname = "first_match",
prefix_for_regex_from_string = "",
suffix_for_regex_from_string = "",
keep_only_row_without_a_pattern = TRUE,
varname_for_matches = "matches"
)
Arguments
- df
A data frame containing the data to be processed.
- content_varname
character
, default ="content"
A character string specifying the name of the column containing the text to be searched. Default is "content".- pattern_varname,
default =
"first_match"
A character string specifying the name of the column containing the patterns that will be matched. Default is "first_match".- prefix_for_regex_from_string
character
, default =""
A character string to be used as a prefix in the regex pattern.- suffix_for_regex_from_string
character
, default =""
A character string to be used as a suffix in the regex pattern.- keep_only_row_without_a_pattern
logical
, default =TRUE
IfTRUE
, keeps only rows with an initial entry for constructing the pattern (i.e. lines with a character in thepattern_varname
column of the df passed by the user will be filter out)- varname_for_matches
character
, default ="matches"
A character string specifying the name of the column of matches in the returned df.
Details
The returned data frame has 5 columns:
row_number
The row number of the original data frame where the text is matched.
matches
The text matched by the pattern, e.g., name of a person.
content
The text content where the pattern was searched, i.e. the column that is identified with
content_varname
first_match
The original pattern searched for (filled with NA if keep_only_row_without_a_pattern is
TRUE
)
Examples
if (FALSE) { # \dontrun{
df <- data.frame(content = c("Citation (Bob, 2021)", "Another Bob"), first_match = c("Bob" , NA))
get_citations_network_from_df(df ) # Return only the 2nd line (match 'Bob')
get_citations_network_from_df(df, keep_only_row_without_a_pattern = FALSE)
#will return the lines (matching 'Bob')
} # }