Abstract
The US EPA Office of Research and Development (ORD) has conducted a research program assessing potential risks of emerging materials and technologies, including engineered nanomaterials (ENM). As a component of that program, a nanomaterial knowledge base, termed “NaKnowBase”, was developed containing the results of published ORD research relevant to the potential environmental and biological actions of ENM. The experimental data address issues such as ENM release into the environment; fate, transport and transformations in environmental media; exposure to ecological species or humans; and the potential for effects on those species. The database captures information on the physicochemical properties of ENM tested, assays performed and their parameters, and the results obtained. NaKnowBase (NKB) is a relational SQL database, and may be queried either with SQL code or through a user-friendly web interface. Filtered results may be output in spreadsheet format for subsequent user-defined analyses. Potential uses of the data might include input to quantitative structure-activity relationships (QSAR), meta-analyses, or other investigative approaches.
Measurement(s) | engineered nanomaterial effects |
Technology Type(s) | digital curation |
Factor Type(s) | physicochemical property |
Sample Characteristic - Environment | nanomaterial |
Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.17060120
Similar content being viewed by others
Background & Summary
The recent advances of nanotechnology have led to concerns for the potential release of engineered nanomaterials (ENM) into the environment causing exposure to, and perhaps adverse effects on, humans or sensitive ecological species1. Accordingly, the United States Environmental Protection Agency (US EPA) Office of Research and Development (ORD) has developed a research program aimed at understanding the potential environmental implications of ENM. ORD research encompasses potential releases of ENM from manufacturing and commercial uses; environmental transformations, fate, and transport; exposures; and potential adverse health effects. A framework was developed to organize and integrate this diverse set of information2. To support this larger effort, a relational database was developed containing ORD nanomaterial research data to better enable the use and synthesis of study results, and to facilitate higher-order analyses such as quantitative structure-activity relationships (QSAR). One goal is to probe the relationships between physical and chemical properties of ENM and their environmental actions to see if predictive relationships can be determined. This publication announces the release of “NaKnowBase” (NKB), a knowledge base containing the results of multiple ORD publications on the actions of ENM in environmental or biological media.
The design of NKB was intended to compliment efforts in nanoinformatics – the strategic curation and collation of nanomaterial data for analytic purposes. A roadmap for nanoinformatics in the European Union (EU) and US was recently published providing a comprehensive overview of the inter-related scientific disciplines of nanomaterials science, physicochemical characterization, computational modelling, informatics, and ecological and human toxicology3. This analysis identified three challenges facing nanoinformatics: (1) limited datasets, (2) limited data access, and (3) regulatory requirements for validating and accepting computational models. NKB partially addresses the first two of these issues by providing a publicly available source of curated data relevant to ENM environmental health and safety (EHS). Collating datasets from multiple sources facilitates more comprehensive meta-analyses, QSAR, and risk assessment approaches such as read-across4. To date, such “big data” endeavours in ENM EHS tend to be designed around large datasets that must be generated in advance, or remain limited by a paucity of relevant, curated data from disparate sources4,5,6. Efforts like NKB can help overcome these research hurdles by being strategically designed to leverage extant data while also being amenable to newly generated data.
There are other nanomaterial-related databases indexed in the appendix section of the EU-US roadmap3. These databases are independently operated and vary according to the intended use and operability, the types of data captured, and the data format, access, and control. Although it may appear advantageous to consolidate these, there are several factors favouring the maintenance of independent databases: ability to control access to, quality of, and integrity of the data, managing and protecting proprietary and confidential business information, the pragmatics of scale, and the availability and continuity of funding. Therefore, the original scope of the NKB was limited to data collected by the EPA ORD. To our knowledge, the data provided in NKB are not collected elsewhere. The data in NKB represent the only collated source of published data from the US Environmental Protection Agency in a relational database regarding the potential environmental effects of engineered nanomaterials.
NKB was built as an SQL relational database. The overall structure is shown in Fig. 1. The database has separate tables on the source publication, the tested materials and their physicochemical properties, the media in which the materials were tested, the assays performed, the parameters evaluated, and the results. There are sub-tables to capture data on chemical contaminants, attached functional groups, and test media additives. Data entry is accomplished by curators via a set of prescribed Excel spreadsheets that are then imported to the database using a script. During curation, efforts are maintained to use terminology consistent with an expanded nanomaterial ontology being developed by several nanoinformatics groups including the EU NanoSafety Cluster and the Center for the Environmental Implications of Nanotechnology (CEINT), in coordination with the foundational work published by the eNanoMapper database7,8. In addition, a simple, user-friendly interface was developed which allows users to search the database and obtain outputs of data in spreadsheet format.
Methods
Publications selected for curation were limited to research conducted by ORD and related to environmental or biological actions of ENM. This included in vivo, in vitro, and in silico experiments as well as life-cycle analyses and physicochemical characterisations. The data in the database reflect over 120 relevant publications from approximately 2012 through November 2019. Over 70 unique nanomaterials as defined by the combined composition of the core, shell and coatings were studied. Over 160 named assays and 22,000 individual assays were run. We expect to maintain the database and continue to make additions over time as new research becomes available. Though NKB will be made available through the Office of Science Management as a public EPA database tool, pertinent NKB data will also be integrated with the CompTox Chemicals Dashboard (https://comptox-prod.epa.gov/dashboard/chemical_lists/), which maps the DSSTox substance records to the most current list of NKB nanomaterials. The addition of new data will be announced via the CompTox Chemical Dashboard (https://comptox.epa.gov/dashboard/) on the ‘News’ (https://comptox.epa.gov/dashboard/news_info) and ‘Downloads’ (https://comptox.epa.gov/dashboard/downloads) pages of the Dashboard, as appropriate.
The EPA maintains various repositories for planned, ongoing, and completed research and projects. These repositories were searched for relevant publications for curation. The description and content of these repositories are detailed below.
STICS
The Scientific & Technical Information Clearance System (STICS) is used by ORD to electronically approve and monitor scientific and technical products produced by ORD. STICS allows approved users with an EPA account and password (such as EPA employees and contractors) to search entries and download the results.
Science inventory
The Science Inventory (SI) stores publicly available records about research conducted by the EPA, allowing EPA account-holding users to search through entries. Much of the database-relevant information in SI overlaps with STICS.
Science hub
Science Hub is a data storage site for datasets associated with recently published EPA journal articles (beginning in 2016). EPA employees and contractors may access these datasets directly through Science Hub while the general public is granted access through a separate portal (The Environmental Dataset Gateway; https://edg.epa.gov/metadata/catalog/main/home.page).
Direct input from investigators
Where available, ORD researchers provided their publication(s) and original data for inclusion in the database. These papers and submitted data were evaluated on a case-by-case basis and formatted by trained curators for inclusion in the database. Approximately 9% of the entries were submitted directly by the investigators. Among the reasons that original data may not have been available included the primary investigators having left the Agency, data having been archived, lack of access to raw data from scientific instruments, and incompatible formats. An example of an incompatible format was lists of differentially expressed genes encoded as “increased” or “decreased” where the data fields in the NKB required numeric value entries.
Systematic article selection
Papers of interest were identified by running keyword searches through STICS, Science Inventory and Science Hub. A list of entries containing “nano” in the keywords or title were obtained. Additional queries were run separately using search terms including the composition of common ENM (e.g. silver, copper, titanium dioxide, cerium dioxide, etc.). Results were checked for duplicates, and posters, abstracts, or meeting presentations were not considered for curation. Over 600 titles were identified for further screening. These results were then reviewed to identify only original, peer-reviewed research. Finally, titles and abstracts were carefully read for relevance to nanotoxicology, environmental effects of nanomaterials, physical and chemical properties, and ENM life cycle. Other nanomaterial papers including literature reviews and those relating to topics such as incidental or naturally occurring nanomaterials, method development or “green chemistry” synthesis of nanomaterials were excluded.
Table organization and curation procedures
The curation of data into the database required a set of trained data curators and a substantial commitment of time and effort. Artificial intelligence or other automated procedures were not used. The original training of data curators was generously conducted by the database experts of the Center for Environmental Implications of Nanotechnology (CEINT) in association with the Nano Informations Common (CEINT NIC), a database maintained at Duke University in Durham NC. Experienced NKB curators subsequently oversaw the training of new data curators as needed. Training consisted of explaining the overall purpose and structure of the database and the data input templates, and then overseeing the curation of selected model datasets which had been curated previously by others. When the novice curators were sufficiently proficient at capturing data from the training sets, they began with oversight to encode new manuscripts. Curators typically became proficient in a matter of a few weeks. Once curators were proficient, curation of data from each new manuscript typically required between one to several workdays depending on the complexity of the material. Questions or uncertainty about experimental procedures or parameters were referred to the project management and occasionally required contact with the authors of the original manuscripts for clarification. Thus, the robust curation of data for the database required considerable time and effort of skilled personnel.
Data extraction and curation occurred in accordance with an approved EPA quality assurance project plan (QAPP E-TAB-0030177, Project ID “Emerging Materials Project 18.02”). In summary, all data were collected from published journal articles. Metadata were attached to all curated data. Data were extracted from manuscript figures using a web application called WebPlotDigitizer (https://automeris.io/WebPlotDigitizer/). Modifications to curated data (for correction of curation errors, etc.) were logged and described in a separate text file.
Publications were added to NKB by entering metadata, experimental procedures, and results into a data collection template comprised of 11 preformatted Excel spreadsheets. Once completed, automated uploading of curation tables into database was accomplished by an in house Java program that transformed the contents of the templates into database-ready tables (csv files).
SQL structure
The overall SQL structure of NKB is presented in Fig. 1, and a brief description of each data table is provided in Table 1. An overview of the fields and columns, in each NKB data table is further detailed in Tables 2–11. Field names are PascalCase to distinguish them from lowercase data table names. Primary keys, or fields comprised of unique identifiers for each entry in a data table, are listed first. Most tables use a single field as the primary key; the Material, Assay, and Medium tables use two keys. Primary keys and foreign keys are used to connect related data that are stored in different tables.
NKB User interface
The NKB user interface application is currently under development. Deployment is expected in 2023 under the EPA web domain naknowbase.epa.gov. Here, curated data can be accessed through a user-friendly interface and search results can be downloaded for subsequent analysis by the user. NKB data can be filtered by numerous parameters such as ENM composition, physical and chemical characteristics, assay name and type, assay parameters, and result name. NKB data points are also linked to the original peer-reviewed publications via a single hyperlink.
The NKB user interface allows users to search for data using a pre-defined list of relevant search terms categorized by data tables and table fields. The searchable data fields were derived from those listed in Tables 2–11.
Data Records
Figure 1 and Table 1 describe all the individual data sources integrated in NKB. The NKB data frame has been uploaded into a single collection entitled “NaKnowBase-SQL backend-080121” 9. The files contained in this collection include the most recent SQL data structure for NKB, including all tables, as well as corresponding data categories and keys for the backend of the database.
EPA nanomaterials present in NKB are also provided through the CompTox Chemicals Dashboard (https://comptox.epa.gov/dashboard/chemical-lists/NAKNOWBASE), which maps EPA chemical substance records to the most current list of NKB nanomaterial substance records (last updated 12/14/2020).
Technical Validation
In general, there are many varied methodologies for cataloguing nanomaterials metadata and physicochemical properties; NKB attempts to capture as much of this information as possible.
Publications considered for curation were limited to ORD research, which is subject to rigorous internal and external quality control and peer review. All research conducted at ORD must have a corresponding Quality Assurance Project Plan (QAPP). QAPPs describe the necessary quality assurance and quality control measures needed to produce results that meet stated performance criteria. ORD OAPPs are peer-reviewed, approved by management, overseen by a quality assurance manager, and subject to periodic QA and performance quality checks. Manuscripts submitted for publication are linked to approved QA plans and are subject to QA review and approval. Furthermore, manuscripts are subject to thorough internal scientific peer review before undergoing additional external, independent peer review by the publishing journal. These systems are intended to ensure the quality and accuracy of ORD data, and help assure the reliability of data being curated in NKB. Because of this, the results of the papers themselves were not checked for errors during data curation. Instead, quality control efforts focused on ensuring the accuracy of the curated data compared to the original raw data, as well as consistent curation procedure between curators.
To assess the quality of NKB curation, a random sampling (approx. 5%) of curated papers were manually checked for quality control. It was found that data derived from the digitization of published graphs differed from the original data by an average of 0.20% ± 0.29% (N = 316) and that curation of the same data by different curators differed by an average of 0.33% ± 3.3% (N = 736). The data are calculated as Mean ± SD normalized to the axis scale.
Usage Notes
Potential uses of the data include input to quantitative structure-activity relationships (QSAR), meta-analyses, or other modeling or investigative approaches. Users should be aware that data obtained from the NKB includes a large number of potential parameters related to physicochemical properties of ENM. Because relatively few of these properties were entirely consistent across sources, the NKB contains many sparsely populated fields. Users should consider this when planning analyses of data from the NKB. Updates to the NKB described herein help inform new testable hypotheses about the etiology and mechanisms underlying ENM effects in the environment and adverse health outcomes of toxicological concern in relation to human exposure to nanomaterials.
References
National Science and Technology Council Commitee on Technology, S. o. N. S. E. a. T. National Nanotechnology Initiative Environmental Health, and Safety Research Strategy. https://www.nano.gov/sites/default/files/pub_resource/nni_2011_ehs_research_strategy.pdf (2011).
Boyes, W. K. et al. A comprehensive framework for evaluating the environmental health and safety implications of engineered nanomaterials. Crit Rev Toxicol 47, 767–810, https://doi.org/10.1080/10408444.2017.1328400 (2017).
Haase, A. & Klaessig, F. EU US Roadmap Nanoinformatics 2030 (2018).
Karcher, S. et al. Integration among databases and data sets to support productive nanotechnology: Challenges and recommendations. NanoImpact 9, 85–101, https://doi.org/10.1016/j.impact.2017.11.002 (2018).
Findlay, M. R., Freitas, D. N., Mobed-Miremadi, M. & Wheeler, K. E. Machine learning provides predictive analysis into silver nanoparticle protein corona formation from physicochemical properties. Environmental Science: Nano 5, 64–71, https://doi.org/10.1039/C7EN00466D (2018).
Gernand, J. M. & Casman, E. A. A Meta-Analysis of Carbon Nanotube Pulmonary Toxicity Studies—How Physical Dimensions and Impurities Affect the Toxicity of Carbon Nanotubes. Risk Analysis 34, 583–597, https://doi.org/10.1111/risa.12109 (2014).
Jeliazkova, N. et al. The eNanoMapper database for nanomaterial safety information. Beilstein Journal of Nanotechnology 6, 1609–1634, https://doi.org/10.3762/bjnano.6.165 (2015).
Hastings, J. et al. The eNanoMapper: harnessing ontologies to enable data integration for nanomaterial risk assessment. J. Biomed. Semantics 6, 10, https://doi.org/10.1186/s13326-015-0005-5 (2015).
Mortensen, H. M. et al. The EPA NaKnowBase-SQL backend-080121. U.S. EPA Office of Research and Development (ORD) https://doi.org/10.23719/1522951 (2021).
Hendren, C. O., Powers, C. M., Hoover, M. D. & Harper, S. L. The Nanomaterial Data Curation Initiative: A collaborative approach to assessing, evaluating, and advancing the state of the field. Beilstein journal of nanotechnology 6, 1752–1762 (2015).
Acknowledgements
The authors gratefully acknowledge Laura Degn and Alexandra Reyes for additional data curation, Philip Langley and Trevor Levey for developing the user interface, Drs. Christopher M. Grulke and Antony Williams for consulting on database and chemo-informatics issues, and Drs. Peter Byrley and Kim Rogers for comments on an earlier version of the manuscript. The structure of NKB, the curation procedures, training of data curators, and use of consolidated nanomaterial ontologies were all accomplished with the generous consultation of CEINT at Duke University and the staff of their Nanomaterial Information Knowledge Commons (NIKC). We would specifically like to thank Drs. Christine Hendren and Mark Wiesner of CEINT NIKC for their coordination efforts with the international Community of Research including representatives from the National Cancer Informatics Program Nanotechnology Working Group and the EU Nanosafety Cluster to better facilitate cross-platform inter-operability (https://www.beilstein-journals.org/bjnano/content/pdf/2190-4286-6-179.pdf). EPA Disclaimer: This manuscript has been reviewed by the Center for Public Health and Environmental Assessment, United States Environmental Protection Agency and approved for publication. Approval does not signify that the contents necessarily reflect the views and policies of the Agency nor does mention of trade names or commercial products constitute endorsement or recommendation for use.
Author information
Authors and Affiliations
Contributions
W.K.B. conceived of the project, was the overall project coordinator, and contributed text to the manuscript. B.B. was the principal database programmer, designed and modified subsequent versions of the database structure, oversaw data input and quality assurance, and contributed text to the manuscript. G.C. was the primary data curator, evaluated quality assurance and contributed text to the manuscript. B.L.T. contributed text to the manuscript and designed the original structure of the database, data input templates and data selection protocol. P.H. contributed to the database design from a functionality and computational perspective. H.M.M. oversaw database design, built the first version of the database, directed user interface design and construction, conceived and coordinated user interface design and implementation, and contributed to the text.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files associated with this article.
About this article
Cite this article
Boyes, W.K., Beach, B., Chan, G. et al. An EPA database on the effects of engineered nanomaterials-NaKnowBase. Sci Data 9, 12 (2022). https://doi.org/10.1038/s41597-021-01098-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-021-01098-0