3. Standardising with Knowledge Graphs
3.1 From Document to Knowledge Graph
3.1.1 Entity Type - Entity Value Relationship Extraction
- 
OpenNRE: This is an open-source repository which is used to infer relations from a given sentence. 
- 
Zett: This is a zero shot entity relation extraction repo where you give the structure you expect the relation to be in and then it extracts out the connecting values from the text. 
- 
GliREL: You can define the connections between entities using "glirel labels" i.e. you could say diagnosis is "treated with" medication. 
- 
GoLLie: Zero-shot approach to extracting out entities, where you provide some general relations you expect to see and this can extract the relations between entities. 
- 
Research Paper Concepts: - 
Generative Type Oriented Named Entity Extraction: A research paper on a generative approach to named entity extraction. 
- 
Co-attention Network for Joint Entity and Relation Extraction: A research paper on using a co-attention network for joint entity and relation extraction, with provided code. 
 
- 
3.1.2 Document to Triplets
- 
Text2Graph: This is a pre-trained model on HuggingFace that has been trained by ChatGPT to identify triplets in text. 
- 
REBEL: This is a pre-trained model on HuggingFace that extracts triplets out from text. (BERT-based model - you would be limited by 512 tokens.) 
- 
Joint Entity and Relation Extraction: This is a paper outlining the creation of a medically-related dataset to help fine-tune the REBEL model to be better at extracting out medically-related entities. 
- 
OpenIE Standalone Github Repository: A repository for OpenIE, a tool that extracts entities and their relationships from text. 
- 
There is an annotation tool called RTE which uses OpenIE to extract out triplets. 
3.1.3 Triplets to Graph
Structure:
- 
NetworkX: Python package used to create graph data structures. 
- 
Graph-tools: Python package that provides a number of features for handling directed/undirected graphs and complex networks. 
Visualisations:
- 
GraphViz: Python packages to visualise graphs. 
- 
PyVis: Python package to visualise graphs. 
- 
IGraph: Python package to visualise graphs. 
Graph Databases:
- 
Neo4J: Community Edition which is free, but commercialised would need to be payed for. 
- 
JanusGraph: Fully open-source under the Apache 2 license - but it only supports Linux, and data storage requires a cost-based platform. 
- 
ArangoDB: Community Edition which is free, but commercialised would need to be payed for. 
- 
OrientDB: Community Edition which is free, but commercialised would need to be payed for. 
3.2 Entity Resolution Pipelines
3.2.1 Neo4j
- 
Neo4j Entity Resolution Example: A GitHub repository with examples of using Neo4j for entity resolution. 
- 
Neo4j Whitepaper on Graph Databases: A whitepaper explaining the use of graph databases like Neo4j for various applications, including entity resolution. 
- 
Neo4j Pipeline: Outlines a process entities can be resolved: - Coreference Resolution: Replacing all pronouns with the referenced entity.
- NER: Extracting out the named entities from the text provided.
- Entity Disambiguation and Entity Linking: i.e. you could use Wikipedia ID linking - which tries to resolve words that have similar meaning. ("Wikification")
- Co-Occurrence Graphs: This is inferring relationships between a pair of entities based on their presence within a specified unit of text.
- Relationship Extraction:- Rule-based extraction: use grammatical dependencies to extract relationships out.
- Used a trained NLP model to extract relationships between pairs of entities out.
 
 
3.2.2 TigerGraph and Zingg
- 
Entity Resolution with TigerGraph: An article discussing how to use TigerGraph and Zingg for entity resolution. 
- 
Using a Graph Database for Big Data Entity Resolution: A blog post from TigerGraph on using their graph database for big data entity resolution. 
- 
Zingg Github Repository: The GitHub repository for Zingg, a tool for entity resolution and matching records. 
3.2.3 PyJedAI:
- 
PyJedAI CleanCleanER: A tutorial for using PyJedAI for entity matching and clustering. 
- 
PyJedAI Similarity Joins: A tutorial for using PyJedAI for similarity joins in entity resolution. 
- 
ER Evaluation Framework: A framework for evaluating entity resolution systems. 
3.2.4 REBEL + Llama Index:
REBEL extracts triplets from text: This is chunked to ensure REBEL can extract the information out.
3.2.5 KnowledgeGraph
KnowledgeGraph: This demonstrates a framework from going from document to graph - the codebase would likely need reworking.
- 
Use Mistral7B OpenOrca hosted by Ollama: For extracting out triplets. 
- 
NetworkX to make graphs. 
- 
PyVis to visualise the graphs. 
3.2.6 Graph_Maker: Requires GROQ
- 
Define your own ontology i.e. your entities and a description of what those entities are. 
- 
Run the Graph-maker using a large language model to create your graph. 
- 
Then you can use this graph it has created over your documents. 
3.2.7 Instructor:
- Might support Ollama
- You can follow this tutorial but use the ollama implementation.