Collection: Systematized Basic Algorithms and Open-Source Tools for Representation Learning of Large Scale Knowledge Graph



In the Internet era, large-scale knowledge graphs and their computations serve as a "soft" infrastructure for intelligent information processing of massive amounts of content. In academic innovation and open-source sharing, Tsinghua University contributed world-leading and influential results by proposing and constructing the systematized basic algorithms of large-scale knowledge graph representation learning.


The Overall Importance of Knowledge Graph Representation Learning in Artificial Intelligence Research and Application

World-leading systematized basic algorithms for representation learning of large scale knowledge graph

A new generation of artificial intelligence is facing the challenge of large-scale knowledge graph computing. To address three key scientific problems in large-scale knowledge graph representation learning, this project develops systematized fundamental algorithms based on the deep learning paradigm. These problems include the complexity of internal relation types, the complexity of internal reasoning paths, and the insufficient utilization of external rich information. There have been numerous technological innovations, including : The transR algorithm based on relation-specific semantic space projection, the PTransE algorithm for complex relation path reasoning, the TADW algorithm for the fusion of entity-related text attributes, the DKRL algorithm for the fusion of entity definition text description information, and TKRL algorithm for the fusion of entity-related type hierarchy information. The ATT algorithm integrates relational text description information, and the JointE algorithm, employs a mutual attention mechanism in order to perform language modelling and knowledge graph representation learning simultaneously.

This project's eight representative papers have received extensive attention and citations in the international academic community, with a total of 6,185 citations in Google Scholar (the highest citation for a single paper reaches 2,611). Citations include Turing Award winner Yoshua Bengio, National Academy of Engineering member Tom Mitchell, and American Academy of Arts and Sciences member Tomaso Poggio, among others. It is noteworthy that two of the papers ranked second and fifth in citations among the 3,934 papers published in IJCAI 2015-2020 and the 5,392 papers published in AAAI 2015-2020, which are the top-tier international artificial intelligence conferences.


The Framework of Systematized Basic Algorithms and Open-Source Tools for the Representation Learning of Large-Scale Knowledge Graph


International Academic Impact of Two Representative Papers Published in IJCAI and AAAI (Data Source: Google Scholar)

Internationally Influential Open-source System for the Representation Learning of Knowledge Graph

The project was open sourced on GitHub, one of the world's most influential open-source platforms, forming THU-OpenSK, a large-scale knowledge graph representation learning system developed by Tsinghua University. OpenKE, OpenNE, and OpenNRE are three open-source toolkits included in THU-OpenSK, which has been rated 10,722 stars and has been forked 3,180 times. Open-source timing, stars, and forks of the system surpass those of international and Chinese research institutions and enterprises. The influence of THU-OpenSK on open source in the field of knowledge graph representation learning is among the leading positions in the world today, becoming one of the mainstream systematized tools for knowledge graph representation learning worldwide. On GitHub, Tsinghua University's THUNLP project, with THU-OpenSK as its core content, attracts huge amount of attention. Besides THU-OpenSK, OpenI is a new open-source platform of artificial intelligence, which supports the ecological construction of open-source applications.


Open-source Influence of OpenKE in THU-OpenSK (Data Source: GitHub)


Open-source Influence of THUNLP (Data Source: Gitstar)

Contributing Key Resources for the Development of Knowledge Computing in the Era of Artificial Intelligence

In 2017, Thu-OpenSK was successfully applied to two of the famous large-scale general knowledge graphs in the world, Freebase and Wikidata, establishing two knowledge graph representation models based on 10 million entities and 100 million relational triples each. In the era of artificial intelligence, these are two relatively earlier published large-scale, open-source knowledge graph representation models in the world, contributing key resources to the development of knowledge computing. It has been used by researchers at hundreds of institutions both at home and abroad, resulting in a positive social impact.Ten national invention patents have been obtained as a result of this project. Tencent WeChat has successfully applied some of the techniques of this project, resulting in an improved user experience, thus promoting the intelligent level of the digital industry.


The User Distribution of the OpenKE-empowered Open-source Knowledge Graph Models