RECONSTRUCTING ENTITY RELATIONSHIPS IN DATABASE SCHEMAS WITH PLANTUML AND LLMS
DOI:
https://doi.org/10.28925/2663-4023.2025.29.847Keywords:
Entity Relationship Diagram (ERD), PlantUML, Automatization, Relational Databases, Large Language Models (LLMs), ChatGPT-4o, Claude 3.7Abstract
The article explores the potential of using Large Language Models (LLMs) for automatically restoring relationships between tables in SQL databases with incompletely defined foreign keys. To evaluate the ability of LLMs to infer foreign keys from textual descriptions of table structures, an experimental database was created. The database schema, excluding relationships, was provided as input to two large language models: ChatGPT-4o and Claude 3.7 Sonnet. For analysis purposes, only basic information was provided to the LLMs: table names, field names, and primary keys, without any data examples. The ChatGPT-4o model successfully detected all relationships between tables but demonstrated limitations in determining the types of these relationships: all were classified as “one-to-one”, regardless of their actual structure. This indicates the model's inability to accurately interpret the type of relationships based on textual descriptions. In contrast, the Claude 3.7 Sonnet model not only correctly identified all existing relationships, but also correctly determined their types (e.g., one-to-many), demonstrating higher accuracy and a deeper understanding of the database structure within the task at hand. The description of the table structure was provided to the language models in PlantUML format, ensuring a standardized, clear and unambiguous representation of the input data. Based on the modeling results, ER diagrams were also constructed in PlantUML format. The experiment confirms the effectiveness of LLMs in reconstructing missing foreign keys and shows potential for automated analysis, documentation, and improvement of existing databases. Following consistent naming conventions during schema design significantly simplifies both the work of developers and the automated processing of database structures by intelligent systems, playing a crucial role in these processes.
Downloads
References
OpenAI. GPT-4 Technical Report. OpenAI, arXiv:2303.08774 [cs.CL], 2024. doi: 10.48550/arXiv.2303.08774
Anthropic. Introducing Claude, 2023. URL: https://www.anthropic.com/news/introducing-claude
Touvron, H., Lavril, T., Izacard, G., et al. LLaMA: Open and Efficient Foundation Language Models. Meta AI, arXiv:2302.13971 [cs.CL], 2023. doi: 10.48550/arXiv.2302.13971
PlantUML, 2025. URL: https://plantuml.com/.
Mermaid | Diagramming and charting tool. 2025. URL: https://mermaid.js.org/
J. Romeo, M. Raglianti, C. Nagy and M. Lanza, "UML is Back. Or is it? Investigating the Past, Present, and Future of UML in Open Source Software," in 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), Ottawa, ON, Canada, 2025, pp. 692-692, doi: 10.1109/ICSE55347.2025.00155.
Terrastruct, Text to diagram, 2025. URL: https://text-to-diagram.com/.
Feras A. Batarseh, Rasika Mohod, Abhinav Kumar, Justin Bui, 10 - The application of artificial intelligence in software engineering: a review challenging conventional wisdom. Data Democracy (2020) 179-232. doi: 10.1016/B978-0-12-818366-3.00010-1
Javier Cámara, Javier Troya, Lola Burgueño, Antonio Vallecillo, On the assessment of generative AI in modeling tasks: an experience report with ChatGPT and UML. Software and Systems Modeling 22 (2023), 781–793. doi: 10.1007/s10270-023-01105-5
D. Rouabhia, I. Hadjadj, Enhancing Class Diagram Dynamics: A Natural Language Approach with ChatGPT, arXiv:2406.11002v1 [cs.SE], 2024. doi: 10.48550/arXiv.2406.11002.
Härer, Felix, Conceptual model interpreter for large language models. arXiv:2311.07605 [cs.SE], 2023. doi:10.48550/arXiv.2311.07605.
Conrardy, Aaron, and Jordi Cabot, From image to uml: first results of image based uml diagram generation using llms. arXiv:2404.11376 [cs.SE], 2024. doi:10.48550/arXiv.2404.11376
Hideyuki Kanuka, Genta Koreki, Ryo Soga, Kazu Nishikawa, Exploring the chatgpt approach for bidirectional traceability problem between design models and code. arXiv:2309.14992, 2023. doi: 10.48550/arXiv.2309.14992
Malik Abdul Sami, Muhammad Waseem, Zeeshan Rasheed, Mika Saari, Kari Systä, Pekka Abrahamsson. Experimenting with multi-agent software development: Towards a unified platform. arXiv:2406.05381, 2024. doi:10.48550/arXiv.2406.05381
O. Kurotych, L. V. Bulatetska, Optimizing the process of ER diagram creation with PlantUML, CEUR Workshop Proceedings (2025) 47–57. https://cssesw.easyscience.education/cssesw2024/CSSESW2024/paper12.pdf
Kurotych, GitHub - kurotych/sqlant: Generate PlantUML/Mermaid ER diagram textual description from SQL connection string, 2024. URL: https://github.com/kurotych/sqlant.
Kurotych, A. (2024). db_ent.puml – PlantUML library for database entities. GitHub. Retrieved April 23, 2025, from https://github.com/kurotych/sqlant/blob/6c4a5030dfade0731b65e33f1b5f16595d0229c0/puml-lib/db_ent.puml
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Анатолій Куротич, Леся Булатецька, Оксана Онищук

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.