Welcome to the main bioNerDS homepage. All associated files and code are available here under an open-source license. The gold standard corpus is available for download as detailed in the SMBM 2012 conference paper.
If you make use of our gold standard data sets, please cite:
Duck, G., Stevens, R., Robertson, D., and Nenadic, G. Ambiguity and variability of database and software names in bioinformatics. Proc. of 5th International Symposium on Semantic Mining in Biomedicine (SMBM 2012), pages 2-9. PDF
Or, if you make use of our source code or resulting datasets, please cite:
Duck, G., Brass, A., Nenadic, G., Robertson, DL., and Stevens, R. bioNerDS: exploring bioinformatics' database and software use through literature mining. BMC Bioinformatics (2013). LINK
This project has been coded by Geraint Duck, under the supervision of Robert Stevens, Goran Nenadic and David Robertson.
The bibliography files of referenced resource names are can be found at: bibliography and URLs.
The resulting datafiles for both BMC Bioinformatics and Genome Biology can be downloaded here, as used within our BMC Bioinformatics paper analysis.
A summary unformatted list of the resource names extracted (and associated counts) during our journal analysis are available to view:
- Genome Biology, Document Level
- Genome Biology, Mention Level
- BMC Bioinformatics, Document Level
- BMC Bioinformatics, Mention Level
- Combined, Document Level
- Combined, Mention Level
These are based on some basic variant grouping/aggregation methods used to combine multiple variation of a single tool (e.g., acronyms, with/without heads, etc.).
Finally, the heads and weak keywords used by bioNerDS can be found here.