Extract regex matches from a string and return a tidy dataframe — str_extract_all_to_tidy

This function applies stringr::str_extract_all() to a string, extracts regex matches, and returns a 2 columns unested dataframe : 1st column is the matched text. Lines without match are filtered out. 2nd column is the corresponding position index Option offered : customizable colnames

Usage

str_extract_all_to_tidy_df(
  string,
  pattern,
  matches_colname = "matches",
  row_number_colname = "row_number"
)

Arguments

string: character vector. A character vector containing the input text.
pattern: character. A regex pattern to extract matches.
matches_colname: character. A string specifying the column name for extracted matches (default: "matches").
row_number_colname: character. A string specifying the column name for row numbers (default: "row_number").

Value

A dataframe with the extracted matches and their corresponding row numbers.

matches: 1st col' is the matched-text. Colname is indicated with the matches_colname parameter (default is 'matches')
row_number: 2nd col is the position of the match within the vector. Colname is indicated with the row_number_colname parameter (default is 'row_number')

Examples

if (FALSE) { # \dontrun{
text_data <- c("Here is funcA and funcB", "Nothing here", "funcC is present")
pattern <- "func[A-C]"
result_df <- str_extract_all_to_tidy_df(text_data, pattern)
print(result_df)
} # }