Converts a data frame containing sequence IDs and sequences into a FASTA-formatted file, optionally compressed as gzip.
Arguments
- df
- A data frame with at least two columns: one for sequence IDs and one for sequences. 
- id_col
- A character string specifying the column name containing sequence IDs. 
- seq_col
- A character string specifying the column name containing sequence data. 
- output_file
- A character string specifying the output file path. If NULL, the function will stop with an informative message. 
- gzip
- A logical indicating whether to compress the output as a gzip file. Defaults to - TRUE.
- verbose
- A logical indicating whether to print progress messages. Defaults to - TRUE.
Details
This function efficiently writes large sequence datasets to FASTA format, handling compression and progress reporting. It validates input columns and manages memory by processing data in chunks.
Examples
temp_dir <- tempdir()
temp_output <- file.path(temp_dir, "output.fa.gz")
seq_data <- data.frame(
  transcript_id = c("ENST0001", "ENST0002"),
  sequence = c("ATGCTAGCTAG", "GCTAGCTAGCT")
)
df_to_fasta(seq_data, "transcript_id", "sequence", temp_output)
#> Writing FASTA file...
#> FASTA file successfully saved to /tmp/Rtmp2nqZ2n/output.fa.gz