Skip to contents

Read some files and answer the content readed in a df. Then try to extract a pattern and return the extracted text in a column of the returned df (NA meaning 'no match').

Usage

srch_pattern_in_df(
  df,
  content_col_name = "content",
  pattern = "(^| \\.|\\b)([\\.A-Za-z0-9_]+)(?=\\s*(?:<-)\\s*function)",
  match_to_exclude = NULL,
  ignore_match_less_than_nchar = 3,
  extracted_txt_col_name = "matches",
  duplicated_lines_are_normal = F
)

Arguments

df

data.frame A data.frame with a minima a character column.

content_col_name

character, default = "content" Name of the text column in the input df (will be returned in the output df).

pattern

character, default = "\\b([A-Za-z0-9_]+)(?=\\s*(?:<-|=)\\s*(?:function|$))" A regex for matching lines and extract text.

match_to_exclude

character A vector of values that will not be returned such as a match. The rows where the values match any element in this vector will be removed.

ignore_match_less_than_nchar

double, default = 2 Excluding match depending on char. number of the matched text (strictly inferior) Default exclude match of 1 char such as 'x'.

extracted_txt_col_name

character, default = "matches" Column name for the extracted text (last col' of the returned df)

duplicated_lines_are_normal

logical, default = FALSE. If set to TRUE, silent the warning about duplicated lines

Value

A data.frame similar to the one passed by the user with 1 more column : the match ; a minima :

content

character The text column designed by the user.

match

character The matched text on this line, NA if there is no match.

See also