Generative Artificial Intelligence: conceptual and practical analysis of the retrieval-augmented generation model
Keywords:
retrieval-augmented generation, large language models, text generation, generative artificial intelligence, systematic review.Abstract
Objective: To systematically review the status of applications of the retrieval-augmented generation model (RAG) for text generation, the main objective is to make a diagnostic survey, with the secondary purposes of identifying the forms of dissemination preferred by the authors, revealing the topics addressed, the funding entities, and establishing co-authorship networks. Design/Methodology/Approach: Due to its methodology, it is an exploratory-descriptive and bibliographical research. Results/Discussion: The results show that authors prefer to publish their work in conferences, the institution that funds the most research is the National Natural Science Foundation of China, while in the United States the largest number of publications are produced, authors affiliated with universities predominate, and the most attended research topics are large language model systems, question answering, text generation, and the development of knowledge graphs. Conclusions: In the scientific field of information, the RAG model is related to the information retrieval and natural language processing. It overcomes the limitations of large language models by adding specific and updated information from external sources. It has great potential for application in digital humanities research, by taking advantage of the knowledge learned in the parameters of pre-trained large language models with information from specific repositories that could be databases of historical documents, or libraries that can be used to create new services. Originality/Value: It offers a first overview of the RAG model, allowing information professionals to understand this important generative artificial intelligence technology, in addition to knowing its main research funders, the authorship networks between organizations and its main applications in text generation.
Downloads
References
Agarwal, O., Ge, H., Shakeri, S., y Al-Rfou, R. (2021). Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training. En Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tur, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T. y Zhou, Y. (Eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 3554-3565). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-main.278
Asai, A., Gardner ,M., y Hajishirzi, H. (2022). Evidentiality-guided generation for knowledge-intensive NLP tasks. En Carpuat, M., de Marneffe, M.-C. y Meza Ruiz, I. V. (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 2226-2243). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.162
Bouckaert, R.R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A., y Scuse, D. (2015). WEKA Manual for Version 3-7-13. The University of Waikato. https://master.dl.sourceforge.net/project/weka/documentation/3.7.x/WekaManual-3-7-13.pdf
Cai. D., Wang, Y., Liu, L., y Shi, S. (2022). Recent advances in retrieval-augmented text generation. En Amigo, E., Castells, P., Gonzalo, J., Carteree, B., Culpepper, J.S. y Kazai, G. (Eds.), SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 3417-3419). Association for Computational Linguistics. https://doi.org/10.1145/3477495.3532682
Chowdhury, J.R., Zhuang, Y., y Wang, S., (2022). Novelty controlled paraphrase generation with retrieval augmented conditional prompt tuning. En Sycara, K., Honavar, V. y Spaan, M. (Eds.), Proceedings of the 36th AAAI Conference on Artificial Intelligence (pp. 10535-10544). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v36i10.21297
Glass, M., Rossiello, G., Chowdhury, M.F.M., y Gliozzo A. (2021). Robust retrieval augmented generation for zero-shot slot filling. En Moens, M.-F., Huang, X., Specia, L., y Yih, S.W. (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 1939-1949). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.148
Gui, L., Wang, B., Huang, Q., Hauptmann, A., Bisk, Y., y Gao, J. (2022). KAT: a knowledge augmented transformer for vision-and-language. En Carpuat,M., de Marneffe, M.-C. y Meza Ruiz, I.V. (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 956–968). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.70
Han, F.X., Guo, W., Niu, D., He, Y., Lai. K., y Xu, Y. (2019). Inferring search queries from web documents via a graph-augmented sequence to attention network. En Liu, L. y White, R. (Eds.), Proceedings of the WWW '19: The World Wide Web Conference (pp. 2792-2798). International World Wide Web Conference Committee. https://doi.org/10.1145/3308558.3313746
Komeili, M., Shuster, K., y Weston, J. (2022). Internet-augmented dialogue generation. En Muresan, S., Nakov, P., y Villavicencio, A. (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 1, (pp. 8460-8478). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.579
Lewis, P., Perez, E., Piktus, A., Petroni ,F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-T., Rocktäschel, T. y otros. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. En Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (Eds.), Proceedings of the 34th International Conference on Neural Information Processing Systems (pp. 9459–9474). Curran Associates Inc. https://dl.acm.org/doi/pdf/10.5555/3495724.3496517
Mao, Y., He, P., Liu, X., Shen, Y., Gao, J., Han, J., y Chen, W. (2021). Generation-augmented retrieval for open-domain question answering. En Zong, C., Xia, F., Li, W. y Navigli, R. (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (pp. 4089-4100). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.316
Maslej, N., Fattorini, L., Perrault, R., Parli, V., Reuel, A., Brynjolfsson, E., Etchemendy, J., Ligett, K., Lyons, T., Manyika, J. y otros. (2024). Artificial Intelligence Index Report 2024. Stanford University. https://aiindex.stanford.edu/wp-content/uploads/2024/05/HAI_AI-Index-Report-2024.pdf
Raghu, D., Agarwal, S., Joshi, S., y Mausam. 2021. End-to-End learning of flowchart grounded task-oriented dialogs. En Moens, M.-F., Huang, X., Specia, L., y Yih, S.W. (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 4348-4366). Association for Computational Linguistics. https://aclanthology.org/2021.emnlp-main.357.pdf
Rodríguez-Pomeda, J., Perez-Encinas, A., De La Torre, E. M. (2024). Motivaciones de los académicos españoles para publicar en revistas de acceso abierto: un análisis sociodemográfico. Revista Española de Documentación Científica, 47(3), e393. https://doi.org/ 10.3989/redc.2024.3.1555
Sánchez-García-de las Bayonas, S. (2007). Repercusión de la publicación científica electrónica de acceso abierto en los presupuestos y en el acceso a la información científica en las bibliotecas universitarias españolas. Revista Española De Documentación Científica, 30(3), 323–342. https://doi.org/10.3989/redc.2007.v30.i3.388
Shuster, K., Poff, S., Chen, M., Kiela, D., y Weston, J. (2021). Retrieval Augmentation Reduces Hallucination in Conversation. En Moens, M.-F., Huang, X., Specia, L., y Yih, S. W.-T. (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2021 (pp. 3784-3803). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.findings-emnlp.320
Siriwardhana, S., Weerasekera, R., Wen, E., Kaluarachchi, T., Rana, R., y Nanayakkara, S. (2023). Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering. Transactions of the Association for Computational Linguistics, 11, 1-17. https://doi.org/10.1162/tacl_a_00530
Su, Y., Wang, Y., Cai, D., Baker, S., Korhonen, A., y Collier, N. (2021). PROTOTYPE-TO-STYLE: Dialogue generation with style-aware editing on retrieval memory. IEEE/ACM Transactions on Audio Speech and Language Processing, 29, 2152-2161. https://doi.org/10.1109/TASLP.2021.3087948
Tian, Z., Bi, W., Li, X., y Zhang, N.L. (2020). Learning to abstract for memory-augmented conversational response generation. En Korhonen, A., Traum, D. y Màrquez, L. (Eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3816-3825). Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1371
Urbano Salido, C. (2000). Tipología documental citada en tesis doctorales de informática: bases empíricas para la gestión equilibrada de colecciones. Textos universitaris de biblioteconomia i documentació, (5). https://bid.ub.edu/05urban2.htm
Van Eck, N.J. y Waltman, L. (2023). VOSviewer Manual version 1.6.20. Universiteit Leiden. http://www.vosviewer.com/
Wilmot, D., y Keller, F. (2021). Memory and knowledge augmented language models for inferring salience in long-form stories. En Moens, M.-F., Huang, X., Specia, L. y Yih, S. W. (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 851-865). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.65
Wu, Y., Wei, F., Huang, S., Wang, Y., Li, Z., y Zhou, M. (2019). Response generation by context-aware prototype editing. In Van Hentenryck, P. y Zhou, Z.-H (Eds.), Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 7281-7288. AAAI Press. https://doi.org/10.1609/aaai.v33i01.33017281
Yu, W. (2022). Retrieval-augmented Generation across Heterogeneous Knowledge. In Ippolito, D., Li, L.H., Pacheco, M.L., Chen, D. y Xue, N. (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop (pp. 52–58). Association for Computational Linguistics. http://dx.doi.org/10.18653/v1/2022.naacl-srw.7
Zhao, P., Zhang, H., Yu, Q., Wang, Z., Geng, Y., Fu, F., Yang, L., Zhang, W., y Cui, B. (2024). Retrieval-Augmented Generation for AI-Generated Content: A Survey. ArXiv, abs/2402.19473. https://doi.org/10.48550/arXiv.2402.19473
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Angel Freddy Godoy Viera, José Antonio Moreiro-González

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.




