Generative Artificial Intelligence: conceptual and practical analysis of the retrieval-augmented generation model

Authors

Keywords:

retrieval-augmented generation, large language models, text generation, generative artificial intelligence, systematic review.

Abstract

Objective: To systematically review the status of applications of the retrieval-augmented generation model (RAG) for text generation, the main objective is to make a diagnostic survey, with the secondary purposes of identifying the forms of dissemination preferred by the authors, revealing the topics addressed, the funding entities, and establishing co-authorship networks. Design/Methodology/Approach: Due to its methodology, it is an exploratory-descriptive and bibliographical research. Results/Discussion: The results show that authors prefer to publish their work in conferences, the institution that funds the most research is the National Natural Science Foundation of China, while in the United States the largest number of publications are produced, authors affiliated with universities predominate, and the most attended research topics are large language model systems, question answering, text generation, and the development of knowledge graphs. Conclusions: In the scientific field of information, the RAG model is related to the information retrieval and natural language processing. It overcomes the limitations of large language models by adding specific and updated information from external sources. It has great potential for application in digital humanities research, by taking advantage of the knowledge learned in the parameters of pre-trained large language models with information from specific repositories that could be databases of historical documents, or libraries that can be used to create new services. Originality/Value: It offers a first overview of the RAG model, allowing information professionals to understand this important generative artificial intelligence technology, in addition to knowing its main research funders, the authorship networks between organizations and its main applications in text generation.

Downloads

Download data is not yet available.

Author Biographies

Angel Freddy  Godoy-Viera, Universidade Federal de Santa Catarina, Brasil

 

 

 

 

 

José Antonio Moreiro-González, Universidad Carlos III de Madrid, España

 

 

References

Agarwal, O., Ge, H., Shakeri, S., y Al-Rfou, R. (2021). Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training. En Toutanova, K., Rumshisky, A., Zettlemoyer, L., Hakkani-Tur, D., Beltagy, I., Bethard, S., Cotterell, R., Chakraborty, T. y Zhou, Y. (Eds.), Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 3554-3565). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.naacl-main.278

Asai, A., Gardner ,M., y Hajishirzi, H. (2022). Evidentiality-guided generation for knowledge-intensive NLP tasks. En Carpuat, M., de Marneffe, M.-C. y Meza Ruiz, I. V. (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 2226-2243). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.162

Bouckaert, R.R., Frank, E., Hall, M., Kirkby, R., Reutemann, P., Seewald, A., y Scuse, D. (2015). WEKA Manual for Version 3-7-13. The University of Waikato. https://master.dl.sourceforge.net/project/weka/documentation/3.7.x/WekaManual-3-7-13.pdf

Cai. D., Wang, Y., Liu, L., y Shi, S. (2022). Recent advances in retrieval-augmented text generation. En Amigo, E., Castells, P., Gonzalo, J., Carteree, B., Culpepper, J.S. y Kazai, G. (Eds.), SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 3417-3419). Association for Computational Linguistics. https://doi.org/10.1145/3477495.3532682

Chowdhury, J.R., Zhuang, Y., y Wang, S., (2022). Novelty controlled paraphrase generation with retrieval augmented conditional prompt tuning. En Sycara, K., Honavar, V. y Spaan, M. (Eds.), Proceedings of the 36th AAAI Conference on Artificial Intelligence (pp. 10535-10544). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v36i10.21297

Glass, M., Rossiello, G., Chowdhury, M.F.M., y Gliozzo A. (2021). Robust retrieval augmented generation for zero-shot slot filling. En Moens, M.-F., Huang, X., Specia, L., y Yih, S.W. (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 1939-1949). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.148

Gui, L., Wang, B., Huang, Q., Hauptmann, A., Bisk, Y., y Gao, J. (2022). KAT: a knowledge augmented transformer for vision-and-language. En Carpuat,M., de Marneffe, M.-C. y Meza Ruiz, I.V. (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 956–968). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.naacl-main.70

Han, F.X., Guo, W., Niu, D., He, Y., Lai. K., y Xu, Y. (2019). Inferring search queries from web documents via a graph-augmented sequence to attention network. En Liu, L. y White, R. (Eds.), Proceedings of the WWW '19: The World Wide Web Conference (pp. 2792-2798). International World Wide Web Conference Committee. https://doi.org/10.1145/3308558.3313746

Komeili, M., Shuster, K., y Weston, J. (2022). Internet-augmented dialogue generation. En Muresan, S., Nakov, P., y Villavicencio, A. (Eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, 1, (pp. 8460-8478). Association for Computational Linguistics. https://doi.org/10.18653/v1/2022.acl-long.579

Lewis, P., Perez, E., Piktus, A., Petroni ,F., Karpukhin, V., Goyal, N., Küttler, H., Lewis, M., Yih, W.-T., Rocktäschel, T. y otros. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. En Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (Eds.), Proceedings of the 34th International Conference on Neural Information Processing Systems (pp. 9459–9474). Curran Associates Inc. https://dl.acm.org/doi/pdf/10.5555/3495724.3496517

Mao, Y., He, P., Liu, X., Shen, Y., Gao, J., Han, J., y Chen, W. (2021). Generation-augmented retrieval for open-domain question answering. En Zong, C., Xia, F., Li, W. y Navigli, R. (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (pp. 4089-4100). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.acl-long.316

Maslej, N., Fattorini, L., Perrault, R., Parli, V., Reuel, A., Brynjolfsson, E., Etchemendy, J., Ligett, K., Lyons, T., Manyika, J. y otros. (2024). Artificial Intelligence Index Report 2024. Stanford University. https://aiindex.stanford.edu/wp-content/uploads/2024/05/HAI_AI-Index-Report-2024.pdf

Raghu, D., Agarwal, S., Joshi, S., y Mausam. 2021. End-to-End learning of flowchart grounded task-oriented dialogs. En Moens, M.-F., Huang, X., Specia, L., y Yih, S.W. (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 4348-4366). Association for Computational Linguistics. https://aclanthology.org/2021.emnlp-main.357.pdf

Rodríguez-Pomeda, J., Perez-Encinas, A., De La Torre, E. M. (2024). Motivaciones de los académicos españoles para publicar en revistas de acceso abierto: un análisis sociodemográfico. Revista Española de Documentación Científica, 47(3), e393. https://doi.org/ 10.3989/redc.2024.3.1555

Sánchez-García-de las Bayonas, S. (2007). Repercusión de la publicación científica electrónica de acceso abierto en los presupuestos y en el acceso a la información científica en las bibliotecas universitarias españolas. Revista Española De Documentación Científica, 30(3), 323–342. https://doi.org/10.3989/redc.2007.v30.i3.388

Shuster, K., Poff, S., Chen, M., Kiela, D., y Weston, J. (2021). Retrieval Augmentation Reduces Hallucination in Conversation. En Moens, M.-F., Huang, X., Specia, L., y Yih, S. W.-T. (Eds.), Findings of the Association for Computational Linguistics: EMNLP 2021 (pp. 3784-3803). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.findings-emnlp.320

Siriwardhana, S., Weerasekera, R., Wen, E., Kaluarachchi, T., Rana, R., y Nanayakkara, S. (2023). Improving the domain adaptation of retrieval augmented generation (RAG) models for open domain question answering. Transactions of the Association for Computational Linguistics, 11, 1-17. https://doi.org/10.1162/tacl_a_00530

Su, Y., Wang, Y., Cai, D., Baker, S., Korhonen, A., y Collier, N. (2021). PROTOTYPE-TO-STYLE: Dialogue generation with style-aware editing on retrieval memory. IEEE/ACM Transactions on Audio Speech and Language Processing, 29, 2152-2161. https://doi.org/10.1109/TASLP.2021.3087948

Tian, Z., Bi, W., Li, X., y Zhang, N.L. (2020). Learning to abstract for memory-augmented conversational response generation. En Korhonen, A., Traum, D. y Màrquez, L. (Eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3816-3825). Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1371

Urbano Salido, C. (2000). Tipología documental citada en tesis doctorales de informática: bases empíricas para la gestión equilibrada de colecciones. Textos universitaris de biblioteconomia i documentació, (5). https://bid.ub.edu/05urban2.htm

Van Eck, N.J. y Waltman, L. (2023). VOSviewer Manual version 1.6.20. Universiteit Leiden. http://www.vosviewer.com/

Wilmot, D., y Keller, F. (2021). Memory and knowledge augmented language models for inferring salience in long-form stories. En Moens, M.-F., Huang, X., Specia, L. y Yih, S. W. (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 851-865). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.65

Wu, Y., Wei, F., Huang, S., Wang, Y., Li, Z., y Zhou, M. (2019). Response generation by context-aware prototype editing. In Van Hentenryck, P. y Zhou, Z.-H (Eds.), Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 7281-7288. AAAI Press. https://doi.org/10.1609/aaai.v33i01.33017281

Yu, W. (2022). Retrieval-augmented Generation across Heterogeneous Knowledge. In Ippolito, D., Li, L.H., Pacheco, M.L., Chen, D. y Xue, N. (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop (pp. 52–58). Association for Computational Linguistics. http://dx.doi.org/10.18653/v1/2022.naacl-srw.7

Zhao, P., Zhang, H., Yu, Q., Wang, Z., Geng, Y., Fu, F., Yang, L., Zhang, W., y Cui, B. (2024). Retrieval-Augmented Generation for AI-Generated Content: A Survey. ArXiv, abs/2402.19473. https://doi.org/10.48550/arXiv.2402.19473

Published

2026-02-27

How to Cite

Godoy-Viera, A. F., & Moreiro-González, J. A. (2026). Generative Artificial Intelligence: conceptual and practical analysis of the retrieval-augmented generation model. Libraries. Research Annals, 21(Monográfico), 1–14. Retrieved from https://revistasbnjm.sld.cu/index.php/BAI/article/view/1133