Recent research has revealed that while large language models (LLMs) like GPT-4, Llama, and Gemini excel in many tasks, they struggle with advanced historical questions.…