登录
注册
您所在的位置:首页>>报告>>报告详情

A Web-based Literature Identification Platform for the ECOTOXicol...

基于网络的生态毒理学知识库文献识别平台,由深度学习提供支持。

【作       者】:

{{d.作者}}

【机       构】: 美国环境保护署
【承研机构】:

【发表时间】:

2021-03-12

摘要

The ECOTOXicology Knowledgebase (ECOTOX) is a comprehensive, publicly available resource providing single chemical environmental toxicity data on aquatic life, terrestrial plants and wildlife. The database is updated quarterly, and to identify relevant references and extract pertinent data, the ECOTOX data curation pipeline employs a methodical process similar to initial stages of systematic review. This labor-intensive workflow requires curators to regularly evaluate tens of thousands of candidate references, the majority of which are then rejected as not relevant. After the careful review of hundreds of thousands of potentially relevant articles, the ECOTOX database currently (as of September 2020) contains data for 12,223 chemicals and 13,266 species manually extracted from 50,932 references. The availability of this extensive dataset of historical screening decisions provided us with the opportunity to develop high performance, state-of-the-art neural network classifiers to partially automate title and abstract screening and to categorize (e.g. human health, fate, chemical methods) rejected references. First, we prepared a database containing more than 88,000 previously screened references spanning nearly 100 different chemical-centric datasets. We used this data to develop two deep learning models which were then integrated into a modified version of the SWIFT-Active Screener software, a collaborative web-based reference screening platform. The first model is a neural language-model classifier that predicts the relevance of candidate references. When used to augment the standard SWIFT-Active Screener document prioritization model, this method provides a mean improvement of 6.5% Work Saved over random Sampling (WSS) compared to the standard Active Screener approach. The second model uses a separate deep learning network to conduct multi-class classification of excluded documents to predict the reason for exclusion. This model achieves F-scores in the 65-75% range for the most frequent classes and has been integrated into Active Screener to provide intelligent “default choices” for capturing exclusion reason. Using extensive simulations, we demonstrate that this modified version of Active Screener results in more than a 50% reduction, on average, in time spent screening ECOTOX references, with larger savings for the datasets having the most articles.

标签: {{b}}
展开