Try to match a text pattern in a df column by only extract the text

Read some files and answer the content readed in a df. Then try to extract a pattern and return the extracted text in a column of the returned df (NA meaning 'no match').

Usage

srch_pattern_in_df(
  df,
  content_col_name = "content",
  pattern = "(^| \\.|\\b)([\\.A-Za-z0-9_]+)(?=\\s*(?:<-)\\s*function)",
  match_to_exclude = NULL,
  ignore_match_less_than_nchar = 3,
  extracted_txt_col_name = "matches",
  duplicated_lines_are_normal = F
)

Arguments

df: data.frame A data.frame with a minima a character column.
content_col_name: character, default = "content" Name of the text column in the input df (will be returned in the output df).
pattern: character, default = "\\b([A-Za-z0-9_]+)(?=\\s*(?:<-|=)\\s*(?:function|$))" A regex for matching lines and extract text.
match_to_exclude: character A vector of values that will not be returned such as a match. The rows where the values match any element in this vector will be removed.
ignore_match_less_than_nchar: double, default = 2 Excluding match depending on char. number of the matched text (strictly inferior) Default exclude match of 1 char such as 'x'.
extracted_txt_col_name: character, default = "matches" Column name for the extracted text (last col' of the returned df)
duplicated_lines_are_normal: logical, default = FALSE. If set to TRUE, silent the warning about duplicated lines

Value

A data.frame similar to the one passed by the user with 1 more column : the match ; a minima :

content: character The text column designed by the user.
match: character The matched text on this line, NA if there is no match.

Try to match a text pattern in a df column by only extract the text

Usage

Arguments

Value

See also