RECONSTRUCTING ENTITY RELATIONSHIPS IN DATABASE SCHEMAS WITH PLANTUML AND LLMS

Anatolii Kurotych; Lesia Bulatetska; Oksana Onyshchuk

doi:10.28925/2663-4023.2025.29.847

Authors

Anatolii Kurotych Lesya Ukrainka Volyn National University https://orcid.org/0009-0006-8186-4063
Lesia Bulatetska Lesya Ukrainka Volyn National University https://orcid.org/0000-0002-7202-826X
Oksana Onyshchuk Lesya Ukrainka Volyn National University https://orcid.org/0000-0002-8342-3011

DOI:

https://doi.org/10.28925/2663-4023.2025.29.847

Keywords:

Entity Relationship Diagram (ERD), PlantUML, Automatization, Relational Databases, Large Language Models (LLMs), ChatGPT-4o, Claude 3.7

Abstract

The article explores the potential of using Large Language Models (LLMs) for automatically restoring relationships between tables in SQL databases with incompletely defined foreign keys. To evaluate the ability of LLMs to infer foreign keys from textual descriptions of table structures, an experimental database was created. The database schema, excluding relationships, was provided as input to two large language models: ChatGPT-4o and Claude 3.7 Sonnet. For analysis purposes, only basic information was provided to the LLMs: table names, field names, and primary keys, without any data examples. The ChatGPT-4o model successfully detected all relationships between tables but demonstrated limitations in determining the types of these relationships: all were classified as “one-to-one”, regardless of their actual structure. This indicates the model's inability to accurately interpret the type of relationships based on textual descriptions. In contrast, the Claude 3.7 Sonnet model not only correctly identified all existing relationships, but also correctly determined their types (e.g., one-to-many), demonstrating higher accuracy and a deeper understanding of the database structure within the task at hand. The description of the table structure was provided to the language models in PlantUML format, ensuring a standardized, clear and unambiguous representation of the input data. Based on the modeling results, ER diagrams were also constructed in PlantUML format. The experiment confirms the effectiveness of LLMs in reconstructing missing foreign keys and shows potential for automated analysis, documentation, and improvement of existing databases. Following consistent naming conventions during schema design significantly simplifies both the work of developers and the automated processing of database structures by intelligent systems, playing a crucial role in these processes.

Downloads

Download data is not yet available.

References

OpenAI. GPT-4 Technical Report. OpenAI, arXiv:2303.08774 [cs.CL], 2024. doi: 10.48550/arXiv.2303.08774

Anthropic. Introducing Claude, 2023. URL: https://www.anthropic.com/news/introducing-claude

Touvron, H., Lavril, T., Izacard, G., et al. LLaMA: Open and Efficient Foundation Language Models. Meta AI, arXiv:2302.13971 [cs.CL], 2023. doi: 10.48550/arXiv.2302.13971

PlantUML, 2025. URL: https://plantuml.com/.

Mermaid | Diagramming and charting tool. 2025. URL: https://mermaid.js.org/

J. Romeo, M. Raglianti, C. Nagy and M. Lanza, "UML is Back. Or is it? Investigating the Past, Present, and Future of UML in Open Source Software," in 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), Ottawa, ON, Canada, 2025, pp. 692-692, doi: 10.1109/ICSE55347.2025.00155.

Terrastruct, Text to diagram, 2025. URL: https://text-to-diagram.com/.

Feras A. Batarseh, Rasika Mohod, Abhinav Kumar, Justin Bui, 10 - The application of artificial intelligence in software engineering: a review challenging conventional wisdom. Data Democracy (2020) 179-232. doi: 10.1016/B978-0-12-818366-3.00010-1

Javier Cámara, Javier Troya, Lola Burgueño, Antonio Vallecillo, On the assessment of generative AI in modeling tasks: an experience report with ChatGPT and UML. Software and Systems Modeling 22 (2023), 781–793. doi: 10.1007/s10270-023-01105-5

D. Rouabhia, I. Hadjadj, Enhancing Class Diagram Dynamics: A Natural Language Approach with ChatGPT, arXiv:2406.11002v1 [cs.SE], 2024. doi: 10.48550/arXiv.2406.11002.

Härer, Felix, Conceptual model interpreter for large language models. arXiv:2311.07605 [cs.SE], 2023. doi:10.48550/arXiv.2311.07605.

Conrardy, Aaron, and Jordi Cabot, From image to uml: first results of image based uml diagram generation using llms. arXiv:2404.11376 [cs.SE], 2024. doi:10.48550/arXiv.2404.11376

Hideyuki Kanuka, Genta Koreki, Ryo Soga, Kazu Nishikawa, Exploring the chatgpt approach for bidirectional traceability problem between design models and code. arXiv:2309.14992, 2023. doi: 10.48550/arXiv.2309.14992

Malik Abdul Sami, Muhammad Waseem, Zeeshan Rasheed, Mika Saari, Kari Systä, Pekka Abrahamsson. Experimenting with multi-agent software development: Towards a unified platform. arXiv:2406.05381, 2024. doi:10.48550/arXiv.2406.05381

O. Kurotych, L. V. Bulatetska, Optimizing the process of ER diagram creation with PlantUML, CEUR Workshop Proceedings (2025) 47–57. https://cssesw.easyscience.education/cssesw2024/CSSESW2024/paper12.pdf

Kurotych, GitHub - kurotych/sqlant: Generate PlantUML/Mermaid ER diagram textual description from SQL connection string, 2024. URL: https://github.com/kurotych/sqlant.

Kurotych, A. (2024). db_ent.puml – PlantUML library for database entities. GitHub. Retrieved April 23, 2025, from https://github.com/kurotych/sqlant/blob/6c4a5030dfade0731b65e33f1b5f16595d0229c0/puml-lib/db_ent.puml

RECONSTRUCTING ENTITY RELATIONSHIPS IN DATABASE SCHEMAS WITH PLANTUML AND LLMS

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

index

Language

Make a Submission

counter

Information

Developed By

Current Issue