PubMed Knowledge Graph Datasets (Older Versions)






Dataset Name PKG2020S3 (1781-Sep. 2020), Version 3
Description The new version PKG, PKG2020S3 (1781-Sep. 2020), updated the previous PKG version with PubMed 2020 baseline files, PubMed daily updates files (up to Sep. 2020), and extracted bio-entities, author disambiguation results, extended author information, Scimago that containing journal information, and WOS citations which contains reference relations between PMID and reference PMID and extracted from WOS.

Database Features: PKG2020S3 (1781-Sep. 2020) Features.docx
Database Description: 2-PKG2020S3 (1781-Sep. 2020) Database Description.docx

URL (MySQL dump)
Download URLs
PKG2020S3_A01_Articles.sql.gz MD5Sum
PKG2020S3_A02_AuthorList.sql.gz MD5Sum
PKG2020S3_A03_KeywordList.sql.gz MD5Sum
PKG2020S3_A04_Abstract.sql.gz MD5Sum
PKG2020S3_A05_GrantList.sql.gz MD5Sum
PKG2020S3_A06_MeshHeadingList.sql.gz MD5Sum
PKG2020S3_A07_SupplMeshList.sql.gz MD5Sum
PKG2020S3_A08_ChemicalList.sql.gz MD5Sum
PKG2020S3_A09_CommentsCorrectionsList.sql.gz MD5Sum
PKG2020S3_A10_DataBankList.sql.gz MD5Sum
PKG2020S3_A11_PersonalNameSubjectList.sql.gz MD5Sum
PKG2020S3_A12_InvestigatorList.sql.gz MD5Sum
PKG2020S3_A13_AffiliationList.sql.gz MD5Sum
PKG2020S3_A14_ReferenceList.sql.gz MD5Sum
PKG2020S3_B01_Descriptor.sql.gz MD5Sum
PKG2020S3_B02_projectlist_NIH.sql.gz MD5Sum
PKG2020S3_B03_map_PMID_ProjID.sql.gz MD5Sum
PKG2020S3_B07_ORCID_Main.sql.gz MD5Sum
PKG2020S3_B08_ORCID_Education.sql.gz MD5Sum
PKG2020S3_B09_ORCID_Employment.sql.gz MD5Sum
PKG2020S3_B10_BERN_Main.sql.gz MD5Sum
PKG2020S3_B11_BERN_EntityType.sql.gz MD5Sum
PKG2020S3_B12_BERN_Mutation.sql.gz MD5Sum
PKG2020S3_B13_Semantic_Scholar.sql.gz MD5Sum
PKG2020S3_B14_Scimago.sql.gz MD5Sum
PKG2020S3_B15_WOS_Citation.sql.gz MD5Sum
PKG2020S3_C03_Affiliation_merge.sql.gz MD5Sum
PKG2020S3_C05_NIH_PubMed.sql.gz MD5Sum
URL (CSV) CSV Description: Description About CSV Files.docx

Download URLs
OA01_Author_List.csv.zip
OA02_Bio_entities_Main.csv.zip
OA03_Bio_entities_Mutation.csv.zip
OA04_Affiliations.csv.zip
OA05_Researcher_Employment.csv.zip
OA06_Researcher_Education.csv.zip
OA07_NIH_Projects.csv.zip



Dataset Name PKG (1781-2020), Version 2
Description The PKG (1781-2020) updated the previous PKG version with PubMed 2020 baseline files, PubMed daily updates files, and extracted bio-entities, author disambiguation results, and extended author information. In addition, the PKG (1781-2020) also includes two new data sources: Scimago that containing journal information, and WOS citations which contains reference relations between PMID and reference PMID and extracted from WOS.

Database Features: 1-PKG (1781-2020) Features.docx
Database Description: 2-PKG (1781-2020) Database Description.docx

Dataset Merge Instructions:
  1. When all 11 files have been successfully downloaded and verified by MD5, the 11 files can be combined into one file using the following command in Linux system:
    cat pubmed20_v2.0.sql.gz_* > pubmed20.sql.gz
  2. Next, you can inject the dataset into the target database using the following command:
    gunzip < pubmed20.sql.gz | mysql -uusername -ppassword destinationDatabaseName
URL
Download URLs MD5Sum
Part 00 Part 00 MD5Sum
Part 01 Part 01 MD5Sum
Part 02 Part 02 MD5Sum
Part 03 Part 03 MD5Sum
Part 04 Part 04 MD5Sum
Part 05 Part 05 MD5Sum
Part 06 Part 06 MD5Sum
Part 07 Part 07 MD5Sum
Part 08 Part 08 MD5Sum
Part 09 Part 09 MD5Sum
Part 10 Part 10 MD5Sum
Part 11 Part 11 MD5Sum



Dataset Name PKG (1781-2018), Version 1
Description PubMed Knowledge Graph (PKG (1781-2018)) by extracting bio-entities from 29 million PubMed articles, disambiguating author names of PubMed articles published between 1781 and 2018, and integrating fine-grained affiliation data and extended author and project data.
URL Download URL (52GB)