Skip to contents

This function scrapes articles from the current issue of a specified Nature journal, extracting article titles, URLs, and abstracts with robust fallback handling.

Usage

get_articles(journal, article_selector = ".c-card.c-card--flush",
             title_selector = "h3 a", url_selector = "h3 a",
             abstract_selector = ".c-card__summary", verbose = FALSE)

Arguments

journal

Character string. The full name of the Nature journal (e.g., "Nature Biotechnology", "Nature Medicine").

article_selector

Character string. CSS selector for locating articles on the journal's webpage. Default is ".c-card.c-card–flush".

title_selector

Character string. CSS selector for extracting article titles. Default is "h3 a".

url_selector

Character string. CSS selector for extracting article URLs. Default is "h3 a".

abstract_selector

Character string. CSS selector for extracting article abstracts. Default is ".c-card__summary".

verbose

Logical. If TRUE, prints messages about progress and internal steps. Default is FALSE.

Value

A tibble with columns: title, url, abstract, and source. If no articles are found, returns an empty tibble.

Details

The journal argument is matched (case-insensitively) against available entries from nat_journals(). If not found, an informative error is thrown. Abstracts that are missing are replaced with "Abstract not available". If titles, URLs, and abstracts differ in length, they are truncated to the shortest length with a warning.

Examples

get_articles("Nature Biotechnology")
#> # A tibble: 39 × 4
#>    title                                                   url   abstract source
#>    <chr>                                                   <chr> <chr>    <chr> 
#>  1 Go fund a brain gain                                    http… The gut… Natur…
#>  2 Powering new therapeutics with precision mitochondrial… http… More pr… Natur…
#>  3 Individualized mRNA cancer vaccines make strides        http… The fir… Natur…
#>  4 Court reignites CRISPR patent dispute                   http… Abstrac… Natur…
#>  5 Biotech news from around the world                      http… Abstrac… Natur…
#>  6 Merck’s anti-RSV antibody expands protection for infan… http… Merck’s… Natur…
#>  7 England poised to green-light precision breeding        http… Abstrac… Natur…
#>  8 FDA says GM pigs safe to eat                            http… Abstrac… Natur…
#>  9 Beyond cell atlases: spatial biology reveals mechanism… http… As the … Natur…
#> 10 A call for built-in biosecurity safeguards for generat… http… Abstrac… Natur…
#> # ℹ 29 more rows
get_articles("Nature Reviews Genetics", verbose = TRUE)
#> Accessing URL: https://www.nature.com/nrg/current-issue
#> Extracting articles from https://www.nature.com/nrg/current-issue
#> 9 articles successfully extracted.
#> # A tibble: 9 × 4
#>   title                                                    url   abstract source
#>   <chr>                                                    <chr> <chr>    <chr> 
#> 1 Challenges and solutions to the sustainability of gene … http… Despite… Natur…
#> 2 SLAM-RT&Tag: spatiotemporal profiling of RNA within nuc… http… In this… Natur…
#> 3 Simultaneous single-cell sequencing of RNA and DNA at s… http… In this… Natur…
#> 4 An uneasy truce between population health and the gene … http… In this… Natur…
#> 5 One but not the same — the many genomes of the brain     http… In this… Natur…
#> 6 Diversity and consequences of structural variation in t… http… Collins… Natur…
#> 7 Cytoplasmic mRNA decay and quality control machineries … http… In this… Natur…
#> 8 Progress in understanding the vertebrate segmentation c… http… In this… Natur…
#> 9 Integrating model systems and genomic insights to decip… http… This Re… Natur…