3. Standardising with Knowledge Graphs

3.1 From Document to Knowledge Graph

KG-Completion from the Graph4NLP codebase
OpenNRE: This is an open-source repository which is used to infer relations from a given sentence.
Zett: This is a zero shot entity relation extraction repo where you give the structure you expect the relation to be in and then it extracts out the connecting values from the text.
GliREL: You can define the connections between entities using "glirel labels" i.e. you could say diagnosis is "treated with" medication.
GoLLie: Zero-shot approach to extracting out entities, where you provide some general relations you expect to see and this can extract the relations between entities.
Research Paper Concepts:
- Generative Type Oriented Named Entity Extraction: A research paper on a generative approach to named entity extraction.
- Co-attention Network for Joint Entity and Relation Extraction: A research paper on using a co-attention network for joint entity and relation extraction, with provided code.

Text2Graph: This is a pre-trained model on HuggingFace that has been trained by ChatGPT to identify triplets in text.
REBEL: This is a pre-trained model on HuggingFace that extracts triplets out from text. (BERT-based model - you would be limited by 512 tokens.)
Joint Entity and Relation Extraction: This is a paper outlining the creation of a medically-related dataset to help fine-tune the REBEL model to be better at extracting out medically-related entities.
OpenIE Standalone Github Repository: A repository for OpenIE, a tool that extracts entities and their relationships from text.
There is an annotation tool called RTE which uses OpenIE to extract out triplets.
- Online Tool
- Paper

Structure:

NetworkX: Python package used to create graph data structures.
Graph-tools: Python package that provides a number of features for handling directed/undirected graphs and complex networks.

Visualisations:

Graph Databases:

Neo4J: Community Edition which is free, but commercialised would need to be payed for.
JanusGraph: Fully open-source under the Apache 2 license - but it only supports Linux, and data storage requires a cost-based platform.
ArangoDB: Community Edition which is free, but commercialised would need to be payed for.
OrientDB: Community Edition which is free, but commercialised would need to be payed for.

Neo4j Entity Resolution Example: A GitHub repository with examples of using Neo4j for entity resolution.
Neo4j Whitepaper on Graph Databases: A whitepaper explaining the use of graph databases like Neo4j for various applications, including entity resolution.
Neo4j Pipeline: Outlines a process entities can be resolved:
- Coreference Resolution: Replacing all pronouns with the referenced entity.
- NER: Extracting out the named entities from the text provided.
- Entity Disambiguation and Entity Linking: i.e. you could use Wikipedia ID linking - which tries to resolve words that have similar meaning. ("Wikification")
- Co-Occurrence Graphs: This is inferring relationships between a pair of entities based on their presence within a specified unit of text.
- Relationship Extraction:
  - Rule-based extraction: use grammatical dependencies to extract relationships out.
  - Used a trained NLP model to extract relationships between pairs of entities out.

Entity Resolution with TigerGraph: An article discussing how to use TigerGraph and Zingg for entity resolution.
Using a Graph Database for Big Data Entity Resolution: A blog post from TigerGraph on using their graph database for big data entity resolution.
Zingg Github Repository: The GitHub repository for Zingg, a tool for entity resolution and matching records.

PyJedAI CleanCleanER: A tutorial for using PyJedAI for entity matching and clustering.
PyJedAI Similarity Joins: A tutorial for using PyJedAI for similarity joins in entity resolution.
ER Evaluation Framework: A framework for evaluating entity resolution systems.

REBEL extracts triplets from text: This is chunked to ensure REBEL can extract the information out.

KnowledgeGraph: This demonstrates a framework from going from document to graph - the codebase would likely need reworking.

Define your own ontology i.e. your entities and a description of what those entities are.
Run the Graph-maker using a large language model to create your graph.
Then you can use this graph it has created over your documents.
Tutorial