NEWS

Version 2.2

Fixed issue where plate numbers only sometimes had a leading zero included in their identifier. For consistency with the data released in UK Biobank, plate identifiers now always include a leading 0.

Dramatically reduced test_data size in part due to unsuccessful attempts to address a CRAN NOTE. Test data has now been reduced to 10 rows as before, and 39 columns covering a subset of 8 biomarkers and 6 required sample processing flags - this also makes example output easier to visually inspect.

CRAN NOTE resolved for remove_technical_variation() using solution below, but similar NOTE was generated for other functions on submission of version 2.1.1 to CRAN. Work-around code has now been added to all user-facing functions.

Reverted test_data back to 50 row version after cutting down size did not resolve the problematic CRAN NOTE (see version 2.1 below)
R-package-devel suggested CRAN NOTE issue may be due to misconfiguration of CRAN’s server making data.table use too many threads while running examples. As a work around, remove_technical_variation() now explicitly sets the number of threads data.table can use to 1 if it is running on ukbnmr::test_data (thanks to Ivan Krylov and Dirk Eddelbuettel if this does solve the issue)

Processing.Batch now inferred from Shipment.Plate if Processing.Batch missing (UK Biobank field #20282) in input data and algorithm version 2 used in remove_technical_variation().
Package overview help file is now correctly documented as requested by CRAN following breaking changes in Roxygen 7.0.0 that changed the way this help file was internally tagged in the source code.
Reduced test_data from 50 to 10 rows to (hopefully) get around CRAN NOTE blocking publication on CRAN due to test code exceeding 5s on some very slow CRAN debian servers

Added updated version of algorithm for removing technical variation, which has been modified after exploration of the July 2023 release of the second tranche of UK Biobank NMR data covering ~275,000 participants.
Updated GitHub README and package vignette to provide details and justification for the update algorithm
Added support for additional sample quality control field 20282 “Processing batch from Nightingale Health data”, which is used as part of the updated algorithm for removing technical variation
Added support for new biomarker fields 20281 “Spectrometer-corrected alanine” and 20280 “Glucose-lactate”

Added dependency to bit64 package to ensure the Shipment.Plate column is always correctly interpreted by internal package functions.

Added support for new sample quality control field 20283 “Resolved plate swaps”
Returned data.tables now behave as expected with respect to printing contents: i.e. running a function without storing the result will now show the contents of the returned table, and typing the name of the variable storing the result and hitting enter will show the contents on first try.
Fixed bug where columns corresponding to UK Biobank fields not available to the user would be filled with NAs rather than missing from the returned results.

Fixed error in GitHub README example code, which has now been made consistent with the vignette.
plate ID and timestamp columns in the package test data have now been set as character class instead of data.table specific representations of the integer64 class (from the bit64 package) and POSIXt to safeguard against intermittent errors arising from incorrect type conversion when running the example code without first loading the data.table package.

Updated citations in documentation to reflect article publication in Scientific Data

Created example toy dataset for testing package functions and updated documentation
Removed GitHub README page from package bundle

Minor changes to fix NOTES and WARNINGS thrown by CRAN:
- URLs which have been moved since the initial documentation was written have been fixed.
- Examples have been added to the documentation for each package function.
- A vignette including the most relevant example workflows has been added.

Minor changes to DESCRIPTION, README, and inst/CITATION to prepare for CRAN submission following paper acceptance.

biomarker_qc() now corrects for sample degradation time on a log scale instead of a linear scale to model exponential decay. This mainly impacts histidine concentrations, whose new post-QC values have Pearson correlation of 0.974 with those assuming linear decay, while all other biomarkers have Pearson correlation > 0.99.

extract_biomarker_qc_flags() no longer relies on list of hard coded field IDs it expects not to be present due to no data in UK Biobank, making code more robust to differences between UK Biobank data releases.

remove_technical_variation() now gains a skip.biomarker.qc.flags argument that allows you to skip the collation and curation of biomarker QC flags when removing technical variation from the data
extract_biomarker_qc_flags() and remove_technical_variation() should now be more robust to potential changes made by UK Biobank in extracted data formats, particularly for fields that are empty/contain no data in the showcase (or which gain data between showcase updates).