PubMed Knowledge Graph Datasets

Dataset Name PKG (1781-2018)
Description PubMed Knowledge Graph (PKG (1781-2018)) by extracting bio-entities from 29 million PubMed articles, disambiguating author names of PubMed articles published between 1781 and 2018, and integrating fine-grained affiliation data and extended author and project data.
URL Download URL (52GB)
Dataset Name PKG (1781-2020)
Description The PKG (1781-2020) updated the previous PKG version with PubMed 2020 baseline files, PubMed daily updates files, and extracted bio-entities, author disambiguation results, and extended author information. In addition, the PKG (1781-2020) also includes two new data sources: Scimago that containing journal information, and WOS citations which contains reference relations between PMID and reference PMID and extracted from WOS.

Database Features: 1-PKG (1781-2020) Features.docx
Database Description: 2-PKG (1781-2020) Database Description.docx

Dataset Merge Instructions:
  1. When all 11 files have been successfully downloaded and verified by MD5, the 11 files can be combined into one file using the following command in Linux system:
    cat pubmed20_v2.0.sql.gz_* > pubmed20.sql.gz
  2. Next, you can inject the dataset into the target database using the following command:
    gunzip < pubmed20.sql.gz | mysql -uusername -ppassword destinationDatabaseName
Download URLs MD5Sum
Part 00 Part 00 MD5Sum
Part 01 Part 01 MD5Sum
Part 02 Part 02 MD5Sum
Part 03 Part 03 MD5Sum
Part 04 Part 04 MD5Sum
Part 05 Part 05 MD5Sum
Part 06 Part 06 MD5Sum
Part 07 Part 07 MD5Sum
Part 08 Part 08 MD5Sum
Part 09 Part 09 MD5Sum
Part 10 Part 10 MD5Sum
Part 11 Part 11 MD5Sum