Long Phan*1, Alice Gatti*1, Ziwen Han*2, Nathaniel Li*1
Josephina Hu2, Hugh Zhang‡, Sean Shi2, Michael Choi2, Anish Agrawal2, Arnav Chopra2
Adam Khoja1, Ryan Kim†, Richard Ren1, Jason Hausenloy1, Oliver Zhang1, Mantas Mazeika1
Summer Yue**2, Alexandr Wang**2, Dan Hendrycks**1
1Center for AI Safety, 2Scale AI
Introduction
Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam, a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. The dataset consists of 3,000 challenging questions across over a hundred subjects. We publicly release these questions, while maintaining a private test set of held out questions to assess model overfitting.
Compared against the saturation of some existing benchmarks, Humanity's Last Exam accuracy remains low across several frontier models, demonstrating its effectiveness for measuring advanced, closed-ended, academic capabilities.
Dataset
Humanity's Last Exam (HLE) is a global collaborative effort, with questions from nearly 1,000 subject expert contributors affiliated with over 500 institutions across 50 countries – comprised mostly of professors, researchers, and graduate degree holders.
Examples 1-2/8
Question:
Here is a representation of a Roman inscription, originally found on a tombstone. Provide a translation for the Palmyrene script.
A transliteration of the text is provided: RGYNᵓ BT ḤRY BR ᶜTᵓ ḤBL
Henry T
Merton College, Oxford
Question:
Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.
Edward V
Massachusetts Institute of Technology
Samples of the diverse and challenging questions submitted to Humanity's Last Exam.
Quantitative Results
Accuracy. All frontier models achieve low accuracy on Humanity's Last Exam, highlighting significant room for improvement in narrowing the gap between current LLMs and expert-level academic capabilities on closed-ended questions.
Calibration Error. Given low performance on Humanity's Last Exam, models should be calibrated, recognizing their uncertainty rather than confidently provide incorrect answers, indicative of confabulation/hallucination. To measure calibration, we prompt models to provide both an answer and their confidence from 0% to 100%.
Model | Accuracy (%) ↑ | Calibration Error (%) ↓ |
---|---|---|
GPT-4o | 3.3 | 92.5 |
Grok-2 | 3.8 | 93.2 |
Claude 3.5 Sonnet | 4.3 | 88.9 |
Gemini Thinking | 6.2 | 93.9 |
o1 | 9.1 | 93.4 |
DeepSeek-R1* | 9.4 | 81.8 |
*Model is not multi-modal, evaluated on text-only subset.
Discussion
Future Model Performance
While current LLMs achieve very low accuracy on Humanity's Last Exam, recent history shows benchmarks are quickly saturated -- with models dramatically progressing from near-zero to near-perfect performance in a short timeframe. Given the rapid pace of AI development, it is plausible that models could exceed 50% accuracy on HLE by the end of 2025. High accuracy on HLE would demonstrate expert-level performance on closed-ended, verifiable questions and cutting-edge scientific knowledge, but it would not alone suggest autonomous research capabilities or "artificial general intelligence." HLE tests structured academic problems rather than open-ended research or creative problem-solving abilities, making it a focused measure of technical knowledge and reasoning. HLE may be the last academic exam we need to give to models, but it is far from the last benchmark for AI.
Impact
By providing a clear measure of AI progress, Humanity's Last Exam creates a common reference point for scientists and policymakers to assess AI capabilities. This enables more informed discussions about development trajectories, potential risks, and necessary governance measures.
Related Articles
The New York Times
When A.I. Passes This Test, Look Out
The creators of a new test called 'Humanity's Last Exam' argue we may soon lose the ability to create tests hard enough for A.I. models.
Reuters
AI experts ready 'Humanity's Last Exam' to stump powerful tech
A team of technology experts issued a global call on Monday seeking the toughest questions to pose to artificial intelligence systems, which increasingly have handled popular benchmark tests like child's play.
Citation
@article{phan2025hle, title={Humanity's Last Exam}, author={Phan, Long and Gatti, Alice and Han, Ziwen and Li, Nathaniel and Hu, Josephina and Zhang, Hugh and Shi, Sean and Choi, Michael and Agrawal, Anish and Chopra, Arnav and Khoja, Adam and Kim, Ryan and Hausenloy, Jason and Zhang, Oliver and Mazeika, Mantas and Anderson, Daron and Nguyen, Tung and Mahmood, Mobeen and Feng, Fiona and Y. Feng, Steven and Zhao, Haoran and Yu, Michael and Zou, Chelsea and Wang, Zihan and P. Wang, Jessica and Kumar, Pawan and Pokutnyi, Oleksandr and Gerbicz, Robert and Popov, Serguei and Levin, John-Clark and Schmitt, Johannes and Galgon, Geoff and Sanchez, Alvaro and Lee, Yongki and Yeadon, Will and Sauers, Scott and Roth, Marc and Agu, Chidozie and Riis, Søren and Giska, Fabian and Utpala, Saiteja and Giboney, Zachary and M. Goshu, Gashaw and of Arc Xavier, Joan and Crowson, Sarah-Jane and Maheshbhai Naiya, Mohinder and Burns, Noah and Finke, Lennart and Cheng, Zerui and Park, Hyunwoo and Fournier-Facio, Francesco and Wydallis, John and Nandor, Mark and Singh, Ankit and Gehrunger, Tim and Cai, Jiaqi and McCarty, Ben and Duclosel, Darling and Nam, Jungbae and Zampese, Jennifer and G. Hoerr, Ryan and Bacho, Aras and Abou Loume , Gautier and Galal, Abdallah and Cao, Hangrui and C Garretson, Alexis and Sileo, Damien and Ren, Qiuyu and Cojoc, Doru and Arkhipov, Pavel and Qazi, Usman and Li, Lianghui and Motwani, Sumeet and Schroeder de Witt, Christian and Taylor, Edwin and Veith, Johannes and D. Hartman, Taylor and Rissone, Paolo and Jin, Jaehyeok and Wei Lun Shi, Jack and G. Willcocks, Chris and Robinson, Joshua and Mikov, Aleksandar and Prabhu, Ameya and Tang, Longke and Alapont, Xavier and Zhou, Kevin and de Oliveira Santos, Emily and Pupasov Maksimov, Andrey and Vendrow, Edward and Zenitani, Kengo and Guillod, Julien and Li, Yuqi and Vendrow, Joshua and Kuchkin , Vladyslav and Ze-An, Ng and Marion, Pierre and Efremov, Denis and Lynch, Jayson and Liang, Kaiqu and Gritsevskiy, Andrew and Martinez, Dakotah and Pageler, Ben and Crispino, Nick and Zvonkine, Dmimitri and Wildner Fraga, Natanael and Soori, Saeed and Press, Ori and Tang, Henry and Salazar, Julian and R. Green, Sean and Brüssel, Lina and Twayana, Moon and Dieuleveut, Aymeric and Ryan Rogers, T. and Zhang, Wenjin and Li, Bikun and Yang, Jinzhou and Rao, Arun and Loiseau, Gabriel and Kalinin, Mikhail and Lukas, Marco and Manolescu, Ciprian and , Subrata and Ghislain Kemogne Kamdoum, Ariel and Kreiman, Tobias and Hogg, Tad and Jin, Alvin and Bosio, Carlo and Sun, Gongbo and P Coppola, Brian and Tarver, Tim and Heidinger, Haline and Sayous, Rafael and Ivanov, Stefan and M Cavanagh, Joseph and Shen, Jiawei and Marvin Imperial, Joseph and Schwaller, Philippe and Senthilkuma, Shaipranesh and M Bran, Andres and Dehghan, Ali and Algaba, Andres and Verbeken, Brecht and Noever, David and P V, Ragavendran and Schut, Lisa and Sucholutsky, Ilia and Zheltonozhskii, Evgenii and Lim, Derek and Stanley, Richard and Sivarajan , Shankar and Yang, Tong and Maar, John and Wykowski, Julian and Oller, Martí and Sandlin, Jennifer and Sahu, Anmol and Hu, Yuzheng and Fish, Sara and Heydari, Nasser and Apronti, Archimedes and Rawal, Kaivalya and Garcia Vilchis, Tobias and Zu, Yuexuan and Lackner, Martin and Koppel, James and Nguyen, Jeremy and S. Antonenko, Daniil and Chern, Steffi and Zhao, Bingchen and Arsene, Pierrot and Goldfarb, Alan and Ivanov, Sergey and Poświata, Rafał and Wang, Chenguang and Li, Daofeng and Crisostomi, Donato and Achilleos, Andrea and Myklebust, Benjamin and Sen, Archan and Perrella, David and Kaparov, Nurdin and H Inlow, Mark and Zang, Allen and Thornley, Elliott and Orel, Daniil and Poritski, Vladislav and Ben-David, Shalev and Berger, Zachary and Whitfill, Parker and Foster, Michael and Munro, Daniel and Ho, Linh and Bar Hava, Dan and Kuchkin, Aleksey and Lauff, Robert and Holmes, David and Sommerhage, Frank and Schneider, Keith and Kazibwe, Zakayo and Stambaugh, Nate and Singh, Mukhwinder and Magoulas, Ilias and Clarke, Don and Hyun Kim, Dae and Meneguitti Dias, Felipe and Elser, Veit and Priya Agarwal, Kanu and Efren Guadarrama Vilchis, Victor and Klose, Immo and Demian, Christoph and Anantheswaran, Ujjwala and Zweiger, Adam and Albani, Guglielmo and Li, Jeffery and Daans, Nicolas and Radionov, Maksim and Rozhoň, Václav and Ma, Ziqiao and Stump, Christian and Berkani, Mohammed and Platnick, Jacob and Nevirkovets, Volodymyr and Basler, Luke and Piccardo, Marco and Jeanplong, Ferenc and Cohen, Niv and Gangal, Varun and Tkadlec, Josef and Rosu, Paul and Padlewski, Piotr and Stanislaw Barzowski, and Montgomery, Kyle and Menezes, Aline and Patel, Arkil and Wang, Zixuan and Tucker-Foltz, Jamie and Stade, Jack and Goertzen, Tom and Kazemi, Fereshteh and Milbauer, Jeremiah and Arnold Ambay, John and Shukla, Abhishek and Carlos Leyva Labrador, Yan and Givré, Alan and Wolff, Hew and Rossbach , Vivien and Fayez Aziz, Muhammad and Kaddar, Younesse and Chen, Yanxu and Zhang, Robin and Pan, Jiayi and Terpin, Antonio and Muennighoff, Niklas and Schoelkopf, Hailey and Zheng, Eric and Carmi, Avishy and Jones, Adam and Shah, Jainam and D. L. Brown, Ethan and Zhu, Kelin and Bartolo, Max and Wheeler, Richard and Ho, Andrew and Barkan, Shaul and Wang, Jiaqi and Stehberger, Martin and Kretov, Egor and Sridhar, Kaustubh and EL-Wasif, Zienab and Zhang, Anji and Pyda, Daniel and Tam, Joanna and M. Cunningham, David and Patramanis, Demosthenes and Krause, Michael and Redenti, Andrew and Bugas, Daniel and Aldous, David and Lai, Jesyin and Coleman, Shannon and Bahaloo, Mohsen and Xu, Jiangnan and Lee, Sangwon and Zhao, Sandy and Tang, Ning and K. Cohen, Michael and Carroll, Micah and Paradise, Orr and Hendrik Kirchner, Jan and Steinerberger, Stefan and Ovchynnikov, Maksym and O. Matos, Jason and Shenoy, Adithya and Alves de Oliveira Junior, Benedito and Wang, Michael and Nie, Yuzhou and Giordano, Paolo and Petersen, Philipp and Sztyber-Betley, Anna and Shukla, Priti and Crozier, Jonathan and Pinto, Antonella and Verma, Shreyas and Joshi, Prashant and Yong, Zheng-Xin and Tee, Allison and Andréoletti, Jérémy and Weller, Orion and Singhal, Raghav and Zhang, Gang and Ivanov, Alexander and Khoury, Seri and Mostaghimi, Hamid and Thaman, Kunvar and Chen, Qijia and Quốc Khánh, Trần and Loader, Jacob and Cavalleri, Stefano and Szlyk, Hannah and Brown, Zachary and Roberts, Jonathan and Alley, William and Sun, Kunyang and Stendall, Ryan and Lamparth, Max and Reuel, Anka and Wang, Ting and Xu, Hanmeng and Goud Raparthi, Sreenivas and Hernández-Cámara, Pablo and Martin, Freddie and Malishev, Dmitry and Preu, Thomas and Korbak, Tomek and Abramovitch, Marcus and Williamson, Dominic and Chen, Ziye and Bálint, Biró and Saiful Bari, M and Kassani, Peyman and Wang, Zihao and Ansarinejad, Behzad and Prasad Goswami, Laxman and Sun, Yewen and Elgnainy, Hossam and Tordera, Daniel and Balabanian, George and Anderson, Earth and Kvistad, Lynna and José Moyano, Alejandro and Maheshwari , Rajat and Sakor, Ahmad and Eron, Murat and C. McAlister, Isaac and Gimenez, Javier and Enyekwe, Innocent and Favre D.O., Andrew and Shah, Shailesh and Zhou, Xiaoxiang and Kamalov, Firuz and Clark, Ronald and Abdoli, Sherwin and Meer, Khalida and K Wang, Harrison and Chen, Evan and Tomasiello, Alessandro and Looi, Shi-Zhuo and Le, Vinh-Kha and Kolt, Noam and Mündler, Niels and Semler, Avi and Rodman, Emma and Drori, Jacob and J Fossum, Carl and Jagota, Milind and Pradeep, Ronak and Fan, Honglu and Shah, Tej and Shah, Tej and Eicher , Jonathan and Chen, Michael and Thaman, Kushal and Merrill, William and Harris, Carter and Gross, Jason and Gusev, Ilya and Sharma, Asankhaya and Agnihotri, Shashank and Zhelnov, Pavel and Usawasutsakorn, Siranut and Mofayezi, Mohammadreza and Bogdanov, Sergei and Piperski, Alexander and Carauleanu, Marc and K. Zhang, David and Ler, Dylan and Leventov, Roman and Soroko, Ignat and Jansen, Thorben and Lauer, Pascal and Duersch, Joshua and Taamazyan, Vage and Morak, Wiktor and Ma, Wenjie and Held, William and Đuc Huy, Tran and Xian, Ruicheng and Randy Zebaze, Armel and Mohamed, Mohanad and Noah Leser, Julian and X Yuan, Michelle and Yacar, Laila and Lengler, Johannes and Shahrtash, Hossein and Oliveira, Edson and W. Jackson, Joseph and Espinosa Gonzalez, Daniel and Zou, Andy and Chidambaram, Muthu and Manik, Timothy and Haffenden, Hector and Stander, Dashiell and Dasouqi, Ali and Shen, Alexander and Duc, Emilien and Golshani, Bita and Stap, David and Uzhou, Mikalai and Alina Borisovna Zhidkovskaya, and Lewark, Lukas and Vincze, Mátyás and Wehr, Dustin and Tang, Colin and Hossain, Zaki and Phillips, Shaun and Muzhen, Jiang and Ekström, Fredrik and Hammon, Angela and Patel, Oam and Remy, Nicolas and Farhidi, Faraz and Medley , George and Mohammadzadeh, Forough and Peñaflor, Madellene and Kassahun, Haile and Friedrich, Alena and Sparrow, Claire and Sakal, Taom and Dhamane, Omkar and Khajegili Mirabadi, Ali and Hallman, Eric and Battaglia, Mike and Maghsoudimehrabani, Mohammad and Hoang, Hieu and Amit, Alon and Hulbert, Dave and Pereira, Roberto and Weber, Simon and Mensah, Stephen and Andre, Nathan and Peristyy, Anton and Harjadi, Chris and Gupta , Himanshu and Malina, Stephen and Albanie, Samuel and Cai, Will and Mehkary , Mustafa and Reidegeld, Frank and Dick, Anna-Katharina and Friday, Cary and Sidhu, Jasdeep and Kim, Wanyoung and Costa, Mariana and Gurdogan, Hubeyb and Weber, Brian and Kumar , Harsh and Jiang, Tong and Agarwal, Arunim and Ceconello, Chiara and S. Vaz, Warren and Zhuang, Chao and Park, Haon and R. Tawfeek, Andrew and Aggarwal, Daattavya and Kirchhof, Michael and Dai, Linjie and Kim, Evan and Ferret, Johan and Wang, Yuzhou and Yan, Minghao and Burdzy, Krzysztof and Zhang, Lixin and Franca, Antonio and T. Pham, Diana and Yong Loh, Kang and Robinson, Joshua and Gul, Shreen and Chhablani, Gunjan and Du, Zhehang and Cosma, Adrian and White, Colin and Riblet, Robin and Saxena, Prajvi and Votava, Jacob and Vinnikov, Vladimir and Halasyamani, Shiv and M. Shahid, Syed and Mourrat, Jean-Christophe and Vetoshkin, Lavr and Bacho, Renas and Ginis, Vincent and Maksapetyan, Aleksandr and de la Rosa, Florencia and Li, Xiuyu and Malod, Guillaume and Lang, Leon and Laurendeau, Julien and Adesanya , Fatimah and Portier, Julien and Hollom, Lawrence and Souza, Victor and Anna Zhou, Yuchen and Yalın, Yiğit and Daniel Obikoya, Gbenga and Arnaboldi, Luca and (Michael Pokorny), Rai and Bigi, Filippo and Bacho, Kaniuar and Clavier, Pierre and Recchia, Gabriel and Popescu, Mara and Shulga, Nikita and Mildred Tanwie , Ngefor and C.H. Lux, Thomas and Rank, Ben and Ni, Colin and Yakimchyk, Alesia and (Quinn) Liu , Huanxu and Häggström, Olle and Verkama, Emil and Narayan , Himanshu and Gundlach, Hans and Brito-Santana, Leonor and Amaro, Brian and Vajipey, Vivek and Grover, Rynaa and Fan, Yiyang and Poesia Reis e Silva, Gabriel and Xin, Linwei and Kratish, Yosi and Łucki, Jakub and Li, Wen-Ding and Xu, Justin and Joseph Scaria, Kevin and Vargus, Freddie and Habibi, Farzad and (Tony) Lian, Long and Rodolà, Emanuele and Robins, Jules and Cheng, Vincent and Grabb, Declan and Bosio, Ida and Fruhauff, Tony and Akov, Ido and J. Y. Lo, Eve and Qi, Hao and Jiang, Xi and Segev, Ben and Fan, Jingxuan and Martinson, Sarah and Y. Wang, Erik and Hausknecht, Kaylie and P. Brenner, Michael and Mao, Mao and Jiang, Yibo and Zhang, Xinyu and Avagian, David and Jessica Scipio, Eshawn and Rehan Siddiqi, Muhammad and Ragoler, Alon and Tan, Justin and Patil, Deepakkumar and Plecnik, Rebeka and Kirtland, Aaron and Grace Montecillo, Roselynn and Durand, Stephane and Faruk Bodur, Omer and Adoul, Zahra and Zekry , Mohamed and Douville, Guillaume and Karakoc, Ali and C. B. Santos, Tania and Shamseldeen, Samir and Karim, Loukmane and Liakhovitskaia, Anna and Resman , Nate and Farina, Nicholas and Carlos Gonzalez, Juan and Maayan, Gabe and Hoback, Sarah and De Oliveira Pena, Rodrigo and Sherman, Glen and Mariji, Hodjat and Pouriamanesh, Rasoul and Wu, Wentao and Demir, Gözdenur and Mendoza, Sandra and Alarab, Ismail and Cole, Joshua and Ferreira, Danyelle and Johnson , Bryan and Milliron, Hsiaoyun and Safdari, Mohammad and Dai, Liangti and Arthornthurasuk, Siriphan and Pronin, Alexey and Ramirez-Trinidad, Angel and Cartwright, Ashley and Pottmaier, Daphiny and Taheri, Omid and Outevsky, David and Stepanic, Stanley and Perry, Samuel and Askew, Luke and Adrián Huerta Rodríguez , Raúl and Dendane, Abdelkader and Lorena, Ricardo and Iyer, Krishnamurthy and Md Salauddin, Sk and Islam, Murat and Gonzalez, Juan and Ducey, Josh and Campbell, Russell and Somrak, Maja and Mavroudis, Vasilios and Vergo, Eric and Qin, Juehang and Borbás, Benjámin and Chu, Eric and Lindsey, Jack and Radhakrishnan, Anil and Jallon, Antoine and McInnis, I.M.J. and Hoover, Alex and Möller, Sören and Patwardhan, Tejal and Yue, Summer and Wang, Alexandr and Hendrycks, Dan}, journal={arXiv}, year={2025} }