Try to match a text pattern in a df column by only extract the text
srch_pattern_in_df.Rd
Read some files and answer the content readed in a df. Then try to extract a pattern and return the extracted text in a column of the returned df (NA meaning 'no match').
Usage
srch_pattern_in_df(
df,
content_col_name = "content",
pattern = "(^| \\.|\\b)([\\.A-Za-z0-9_]+)(?=\\s*(?:<-)\\s*function)",
match_to_exclude = NULL,
ignore_match_less_than_nchar = 3,
extracted_txt_col_name = "matches",
duplicated_lines_are_normal = F
)
Arguments
- df
data.frame
A data.frame with a minima acharacter
column.- content_col_name
character
, default ="content"
Name of the text column in the input df (will be returned in the output df).- pattern
character
, default ="\\b([A-Za-z0-9_]+)(?=\\s*(?:<-|=)\\s*(?:function|$))"
A regex for matching lines and extract text.- match_to_exclude
character
A vector of values that will not be returned such as a match. The rows where thevalues
match any element in this vector will be removed.- ignore_match_less_than_nchar
double
, default = 2 Excluding match depending on char. number of the matched text (strictly inferior) Default exclude match of 1 char such as 'x'.- extracted_txt_col_name
character
, default ="matches"
Column name for the extracted text (last col' of the returned df)- duplicated_lines_are_normal
logical
, default =FALSE
. If set toTRUE
, silent the warning about duplicated lines
Value
A data.frame
similar to the one passed by the user with 1 more column : the match ; a minima :
content
character
The text column designed by the user.match
character
The matched text on this line,NA
if there is no match.