Skip to contents

This function applies stringr::str_extract_all() to a string, extracts regex matches, and returns a 2 columns unested dataframe : 1st column is the matched text. Lines without match are filtered out. 2nd column is the corresponding position index Option offered : customizable colnames

Usage

str_extract_all_to_tidy_df(
  string,
  pattern,
  matches_colname = "matches",
  row_number_colname = "row_number"
)

Arguments

string

character vector. A character vector containing the input text.

pattern

character. A regex pattern to extract matches.

matches_colname

character. A string specifying the column name for extracted matches (default: "matches").

row_number_colname

character. A string specifying the column name for row numbers (default: "row_number").

Value

A dataframe with the extracted matches and their corresponding row numbers.

matches

1st col' is the matched-text. Colname is indicated with the matches_colname parameter (default is 'matches')

row_number

2nd col is the position of the match within the vector. Colname is indicated with the row_number_colname parameter (default is 'row_number')

Examples

if (FALSE) { # \dontrun{
text_data <- c("Here is funcA and funcB", "Nothing here", "funcC is present")
pattern <- "func[A-C]"
result_df <- str_extract_all_to_tidy_df(text_data, pattern)
print(result_df)
} # }