Title: | The CRAN Chronology |
---|---|
Description: | Scraping routines and datasets to monitor the evolution of the number of packages on CRAN. |
Authors: | Antoine Languillaume [aut, cre] , Sebastien Rochette [aut] , Vincent Guyader [aut] , ThinkR [cph] |
Maintainer: | Antoine Languillaume <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.0 |
Built: | 2025-01-09 03:10:21 UTC |
Source: | https://github.com/ThinkR-open/cranology |
The evolution of the number of packages on CRAN since its beginning. Last update: 2024-09-10.
cran_monthly_package_number
cran_monthly_package_number
A data frame with 324 rows and 2 variables:
[Date] Month of release.
[numeric] Number of packages available on CRAN at that given date.
https://cran.rstudio.com/src/contrib/ and https://packagemanager.posit.co/cran/
All packages ever available on CRAN. Last update: 2024-09-10.
cran_packages_history
cran_packages_history
A data frame with 312 rows and 10 variables:
[character] Either the name of the .tar.gz or the name of the archive folder holding the .tar.gzs of all versions ever released of a given package.
[POSIXct,POSIXt] The date of upload on CRAN.
[character] The time of upload on CRAN.
[character] The size of the .tar.gzs. '-' in case of archive folder.
[character] The name of the package.
[POSIXct,POSIXt] The date when one version was last archived.
[logical] Was a version ever archived ?
[POSIXct,POSIXt] The date of the first release.
[integer] The number of versions released.
[POSIXct,POSIXt] The date of last release.
https://cran.rstudio.com/src/contrib/
Scrape every folder of the CRAN archive to retrieve both the date of the first release and the number of versions released for all archived packaged.
get_package_first_release(package_name)
get_package_first_release(package_name)
package_name |
A character string. The package name. |
A tibble with three columns: _package_name_, _first_date_ and _n_versions_.
This function is a convenience tool to quickly draw a line showing the evolution of packages number on CRAN since its beginning. It uses the 'cran_monthly_package_number' dataset.
plot_cran_monthly_package_number()
plot_cran_monthly_package_number()
A ggplot object
plot_cran_monthly_package_number()
plot_cran_monthly_package_number()
This function queries ppm to retrieve the number of package on CRAN on a given date.
get_package_number_ppm(dates, parallelize = FALSE)
get_package_number_ppm(dates, parallelize = FALSE)
dates |
A vector of dates. Either a character vector of the form "yyyy-mm-dd" or a vector of class "Date". All dates must be posterior to "2014-09-17", the day of ppm first CRAN snapshot. |
parallelize |
A logical. If TRUE {furrr} is used to asynchronously scrap ppm. |
A data.frame with two columns 'date' and 'n' the number of packages on CRAN at that given 'date'.
get_package_number_ppm(c("2018-04-10", "2020-03-19"))
get_package_number_ppm(c("2018-04-10", "2020-03-19"))
This function is the workhorse of {cranology}. It scrapes https://cran.rstudio.com and generates two datasets:
scrape_cran_history()
scrape_cran_history()
* 'cran_packages_history': A data.frame gathering information about every package that has ever been on CRAN including the first release date the number of versions released so far... * 'cran_monthly_package_number': A data.frame holding the number of packages available on CRAN since its beginning. Data is provided on a montly basis.
A list of two data.frames: 'cran_packages_history' and 'cran_monthly_package_number'.
## Not run: scrape_cran_history() ## End(Not run)
## Not run: scrape_cran_history() ## End(Not run)
The creation 'cran_monthly_package_number' using 'scrape_cran()' is a long process as theunderlying scrapping operations are time consuming. To more rapidly update 'cran_monthly_package_number' it is easier to rely on data from ppm. This what this function does. It uses 'get_package_number_ppm()' to quickly update the dataset.
update_monthly_package_number( cran_monthly_package_number_df, parallelize = FALSE )
update_monthly_package_number( cran_monthly_package_number_df, parallelize = FALSE )
cran_monthly_package_number_df |
A data.frame similar to the 'cran_monthly_package_number' dataset included within {cranology}. |
parallelize |
A logical. If TRUE {furrr} is used to asynchronously scrap ppm. |
# Simulate `cran_monthly_package_number` update date_lag <- 3 df <- cran_monthly_package_number[ 1:(nrow(cran_monthly_package_number) - date_lag), ] update_monthly_package_number( cran_monthly_package_number_df = df )
# Simulate `cran_monthly_package_number` update date_lag <- 3 df <- cran_monthly_package_number[ 1:(nrow(cran_monthly_package_number) - date_lag), ] update_monthly_package_number( cran_monthly_package_number_df = df )