
Voluntary Agency Network of Korea, led by Park Gi-tae, has developed Korea’s first AI performance evaluation index to compare and assess the accuracy of how generative AI platforms reproduce images of Korean culture. The initiative marks the first systematic analysis of how accurately generative AI reflects the global surge in interest in Korean culture.
On March 21, 2026, BTS held a comeback performance at Gwanghwamun under the album title “Arirang.” Although the performance has concluded, the influx of global fans (ARMY) visiting Korea is expected to further expand international interest in traditional Korean culture and history, including Arirang, Gwanghwamun, and hanbok.
Meanwhile, the Netflix animation K-Pop Demon Hunters won both Best Animated Feature and Best Original Song at the 98th Academy Awards. The appearance of characters wearing traditional gat hats and a celebratory performance featuring pansori further heightened global interest in Korean traditional culture.
This growing interest is increasingly leading users to explore information through generative AI. Beyond simple searches, users are now generating and modifying images directly as part of their information consumption. VANK noted that in this process, errors related to Korean cultural content are spreading widely.
According to VANK’s findings, images generated by AI frequently contained errors such as confusion over national identity, distortion of traditional elements, and omission of key features.
For example, in images depicting a person singing “Arirang,”—also the title of BTS’s comeback album—some platforms failed to reflect any Korean cultural context, while others mixed in symbols such as cherry blossoms associated with Japanese culture, resulting in national identity confusion. In pansori images, key elements such as the drummer’s rhythmic accompaniment were omitted, or elements of Chinese opera were incorrectly incorporated.
Distortions were also evident in traditional clothing. In male hanbok, elements such as qipao-style buttons, modern loafers, and cherry blossom backgrounds appeared together, while female hanbok images included lace decorations, headpieces extending to the forehead, and disproportionate skirt lengths. In the case of the gat, nonexistent decorations were added or even unrelated images such as fruit were generated, indicating significantly low accuracy.
In images of tangible cultural heritage requiring detailed representation, structural errors were even more pronounced.
With the BTS performance expected to increase global visits to Gyeongbokgung Palace, VANK found that AI-generated images of the site often omitted key elements such as rank stones and statues of the twelve zodiac animals, or depicted surrounding buildings in colors resembling Chinese imperial palaces. In some cases, requests for images of Geunjeongjeon Hall resulted in entirely unrelated mechanical images, failing to reproduce even the basic architectural form.
Similarly, images of Gyeonghoeru Pavilion showed confusion with Hyangwonjeong Pavilion due to added lotus elements, and inaccuracies in pillars, layout, and overall structure were observed across multiple platforms.
VANK warned that if AI provides distorted images of locations that BTS fans are likely to visit, such distortions could influence perceptions from the initial stage of awareness formation to the actual on-site experience. As global users increasingly rely on generative AI to explore destinations like Gyeongbokgung before visiting, inaccurate images may hinder proper cultural understanding.
The organization has expanded its previous efforts to correct errors in textbooks, maps, and dictionaries into the field of generative AI. Through analysis conducted in collaboration with local governments including Seoul, Gyeonggi Province, Gyeongju, and North Chungcheong Province, VANK identified recurring error patterns across platforms and began developing an objective evaluation index.
The evaluation covered six platforms—ChatGPT, Perplexity, Grok, Bing, Gemini, and Copilot—across five categories: territory (Dokdo, East Sea), food and culinary culture (Kimchi and Kimjang, Bibimbap), traditional clothing (Hanbok, Gat), intangible heritage (Hangeul, Taekwondo, Arirang, Pansori, Ganggangsullae), and tangible cultural heritage (Gyeongbokgung Palace, Seokguram Grotto, Bulguksa Temple, Hwaseong Fortress), covering a total of 15 detailed items.
Each item was scored out of four points based on three criteria: accuracy of components, cultural authenticity and non-confusion, and appropriateness of historical context. Final rankings were calculated by summing the average scores across items.
The results ranked ChatGPT first (50.33), followed by Copilot (45.17), Gemini (39.50), Perplexity (38.17), Bing (34.06), and Grok (30.44). Errors were particularly notable in territorial representation (Dokdo and the East Sea), Hangeul, Gat, and tangible cultural heritage.
The analysis found that while generative AI showed relatively high accuracy in areas with strong visual features and abundant training data, it revealed consistent limitations in areas requiring structural understanding and historical context. Food-related categories performed relatively well, while territory and cultural heritage categories showed lower overall scores and greater variation between platforms.
The territory category recorded the lowest scores overall. While VANK has made tangible progress in correcting the representation of Dokdo and the East Sea, generative AI still struggles to accurately reflect Korea’s perspective in reproducing geographic names and spatial concepts.
The tangible cultural heritage category also showed clear limitations. Due to the need for high precision in representing architectural structures, spatial arrangements, and historical context, most platforms struggled to accurately reproduce these complex elements. VANK emphasized that even if perfect reproduction is technically challenging, platforms should at least reflect basic structures, layouts, and key architectural elements without distorting Korea’s historical context.
VANK warned that if such image errors spread without verification, they could become embedded in user perception and lead to broader distortions of Korean culture. With global interest in Korea rising due to BTS and K-Pop Demon Hunters, and increasingly channeled through AI-based searches, the accuracy of generated content is more critical than ever.
Lee Sei-yeon, who led the evaluation, said, “This assessment allowed us to systematically identify recurring error patterns and vulnerabilities across platforms,” adding that “the mixing of East Asian cultural elements in particular risks blurring cultural identity and requires urgent improvement.” She stressed the need for continuous monitoring to ensure that Korea’s cultural identity and context are accurately represented as more global users encounter Korea through generative AI.
Park Gi-tae stated that the significance of the project lies in establishing, for the first time, a national-level performance evaluation index that enables integrated comparison and assessment beyond individual case studies. He added that the index quantitatively reveals gaps and biases in AI training data and will serve as a benchmark for future strategies to promote Korean culture. He also emphasized that as global interest in Korea reaches unprecedented levels through BTS and K-Pop Demon Hunters, proactively correcting errors in generative AI is a key task in conveying an accurate national image.
VANK plans to expand its work by developing an “AI narrative performance evaluation index” to analyze text-based outputs, where more layered error analysis is expected. The organization will also regularly review the current index to monitor improvements across platforms and continue efforts to enhance the accuracy of Korea-related information in the generative AI environment.
VANK’s full report on the AI image performance evaluation index can be accessed at the link below:
https://drive.google.com/file/d/11OmesZGxSedazGZmVEjY8BFXQccRGRyi/view