{"id":3121,"date":"2026-04-08T06:59:02","date_gmt":"2026-04-08T06:59:02","guid":{"rendered":"https:\/\/cvsc.upcebu.edu.ph\/?post_type=project&#038;p=3121"},"modified":"2026-04-13T07:01:43","modified_gmt":"2026-04-13T07:01:43","slug":"cebqa-cebuano-question-answering-system","status":"publish","type":"project","link":"https:\/\/cvsc.upcebu.edu.ph\/index.php\/project\/cebqa-cebuano-question-answering-system\/","title":{"rendered":"CebQA: Cebuano Question Answering System"},"content":{"rendered":"<p>[et_pb_section fb_built=&#8221;1&#8243; _builder_version=&#8221;4.27.5&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_row column_structure=&#8221;1_3,2_3&#8243; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; custom_padding=&#8221;14px|||||&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_column type=&#8221;1_3&#8243; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_heading title=&#8221;CebQA: Cebuano Question Answering System&#8221; _builder_version=&#8221;4.27.5&#8243; _module_preset=&#8221;default&#8221; title_font=&#8221;|700|on||||||&#8221; title_text_color=&#8221;gcid-body-color&#8221; title_font_size=&#8221;41px&#8221; custom_margin=&#8221;|-828px|25px|||&#8221; global_colors_info=&#8221;{%22gcid-body-color%22:%91%22title_text_color%22%93}&#8221;][\/et_pb_heading][et_pb_image src=&#8221;http:\/\/cvsc.upcebu.edu.ph\/wp-content\/uploads\/2026\/04\/Roxas-NIDS.webp&#8221; title_text=&#8221;Roxas-NIDS&#8221; _builder_version=&#8221;4.27.5&#8243; _module_preset=&#8221;default&#8221; box_shadow_style=&#8221;preset2&#8243; global_colors_info=&#8221;{}&#8221;][\/et_pb_image][\/et_pb_column][et_pb_column type=&#8221;2_3&#8243; _builder_version=&#8221;4.27.4&#8243; _module_preset=&#8221;default&#8221; global_colors_info=&#8221;{}&#8221;][et_pb_text _builder_version=&#8221;4.27.5&#8243; _module_preset=&#8221;default&#8221; text_font_size=&#8221;15px&#8221; text_line_height=&#8221;1.1em&#8221; custom_padding=&#8221;118px|0px||||&#8221; hover_enabled=&#8221;0&#8243; global_colors_info=&#8221;{}&#8221; sticky_enabled=&#8221;0&#8243;]<\/p>\n<p><strong>Lead Researcher(s): Jhoanna Rica T. Lagumbay and Robert R. Roxas<\/strong><br \/><strong>Status:<\/strong> Published<\/p>\n<p style=\"text-align: justify;\"><span style=\"font-weight: 400;\"><strong>Abstract\/summary:<\/strong> <span>Question answering (QA) system answers queries given a corpus of natural language documents. QA systems have seen significant advancements across various languages, such as Arabic and Amharic, using transformer-based models. There remains, however, a notable gap for the Cebuano language, a widely spoken language in the Philippines. One major barrier is the absence of a publicly available Cebuano QA dataset. This study addresses this gap by introducing a three-fold contribution: (1) a pseudonymization technique tailored to Cebuano texts to preserve privacy in news-based datasets, (2) the construction of Cebuano Question Answering Dataset (CebQuAD), the first Cebuano QA dataset, and (3) the development of Cebuano Question Answering (CebQA) system, an end-to-end QA system. To build CebQuAD, Cebuano news articles were collected and pseudonymized to protect personal identities. Question-answer pairs were generated using GPT-4o mini, validated by Cebuano speakers, and split into training, testing and validation sets. The CebQA system incorporates a retriever-reader architecture, employing ElasticSearch\/BM25 and FAISS\/DPR for indexing and retrieval and fine-tuning XLM-RoBERTa for answer extraction. Results show that BM25 achieved the highest retrieval accuracy, while the best reader attained an F1 score of 79.22. The end-to-end system has an F1 score at 49.50 at\u00a0<\/span><i>k<\/i><span>\u00a0= 1, aligning with the retriever\u2019s 63% accuracy, highlighting the viability of CebQA system as the first functional end-to-end QA system for the Cebuano language.<\/span><\/span><\/p>\n<p><b>Keywords:<\/b><\/p>\n<ul class=\"c-article-subject-list\">\n<li class=\"c-article-subject-list__subject\"><strong><span style=\"color: #3366ff;\">Natural Language Processing<\/span><\/strong><\/li>\n<li class=\"c-article-subject-list__subject\"><strong><span style=\"color: #3366ff;\">NLP<\/span><\/strong><\/li>\n<li class=\"c-article-subject-list__subject\"><strong><span style=\"color: #3366ff;\">Question Answering<\/span><\/strong><\/li>\n<li class=\"c-article-subject-list__subject\"><strong><span style=\"color: #3366ff;\">Pseudonymization<\/span><\/strong><\/li>\n<li class=\"c-article-subject-list__subject\"><strong><span style=\"color: #3366ff;\">GPT-40 mini<\/span><\/strong><\/li>\n<li class=\"c-article-subject-list__subject\"><strong><span style=\"color: #3366ff;\">XLM-R<\/span><\/strong><\/li>\n<li class=\"c-article-subject-list__subject\"><strong><span style=\"color: #3366ff;\">BM25<\/span><\/strong><\/li>\n<li class=\"c-article-subject-list__subject\"><strong><span style=\"color: #3366ff;\">Cebuano DPR<\/span><\/strong><\/li>\n<li class=\"c-article-subject-list__subject\"><strong><span style=\"color: #3366ff;\">Cebuano Question Answering Dataset<\/span><\/strong><\/li>\n<li class=\"c-article-subject-list__subject\"><strong><span style=\"color: #3366ff;\">Cebuano Language<\/span><\/strong><\/li>\n<\/ul>\n<p>[\/et_pb_text][et_pb_button button_text=&#8221;Downloadable PDF&#8221; _builder_version=&#8221;4.27.5&#8243; _module_preset=&#8221;default&#8221; button_url=&#8221;https:\/\/link.springer.com\/chapter\/10.1007\/978-3-032-10827-2_28&#8243; url_new_window=&#8221;on&#8221; custom_button=&#8221;on&#8221; button_text_color=&#8221;#8A1538&#8243; hover_enabled=&#8221;0&#8243; sticky_enabled=&#8221;0&#8243;][\/et_pb_button][\/et_pb_column][\/et_pb_row][\/et_pb_section]<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Lead Researcher(s): Jhoanna Rica T. Lagumbay and Robert R. RoxasStatus: Published Abstract\/summary: Question answering (QA) system answers queries given a corpus of natural language documents. QA systems have seen significant advancements across various languages, such as Arabic and Amharic, using transformer-based models. There remains, however, a notable gap for the Cebuano language, a widely spoken [&hellip;]<\/p>\n","protected":false},"author":7,"featured_media":0,"template":"","meta":{"_et_pb_use_builder":"on","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"project_category":[40],"project_tag":[],"class_list":["post-3121","project","type-project","status-publish","hentry","project_category-proceedings"],"_links":{"self":[{"href":"https:\/\/cvsc.upcebu.edu.ph\/index.php\/wp-json\/wp\/v2\/project\/3121","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cvsc.upcebu.edu.ph\/index.php\/wp-json\/wp\/v2\/project"}],"about":[{"href":"https:\/\/cvsc.upcebu.edu.ph\/index.php\/wp-json\/wp\/v2\/types\/project"}],"author":[{"embeddable":true,"href":"https:\/\/cvsc.upcebu.edu.ph\/index.php\/wp-json\/wp\/v2\/users\/7"}],"version-history":[{"count":3,"href":"https:\/\/cvsc.upcebu.edu.ph\/index.php\/wp-json\/wp\/v2\/project\/3121\/revisions"}],"predecessor-version":[{"id":3126,"href":"https:\/\/cvsc.upcebu.edu.ph\/index.php\/wp-json\/wp\/v2\/project\/3121\/revisions\/3126"}],"wp:attachment":[{"href":"https:\/\/cvsc.upcebu.edu.ph\/index.php\/wp-json\/wp\/v2\/media?parent=3121"}],"wp:term":[{"taxonomy":"project_category","embeddable":true,"href":"https:\/\/cvsc.upcebu.edu.ph\/index.php\/wp-json\/wp\/v2\/project_category?post=3121"},{"taxonomy":"project_tag","embeddable":true,"href":"https:\/\/cvsc.upcebu.edu.ph\/index.php\/wp-json\/wp\/v2\/project_tag?post=3121"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}