Humanity's Last Exam is still accepting questions from late contributors and submissions for the dataset and co-authorship, but new submissions are not eligible for the prize pool.
New Submission(for new contributors)
Sign In Dashboard(for current contributors)
Current Contributors
HLS Logo

Humanity's Last Exam

Hugging FaceDatasetload_dataset("cais/hle")

Latest News

CAIS Logo&Scale AI Logo

Long Phan*1, Alice Gatti*1, Ziwen Han*2, Nathaniel Li*1

Josephina Hu2, Hugh Zhang, Sean Shi2, Michael Choi2, Anish Agrawal2, Arnav Chopra2

Adam Khoja1, Ryan Kim, Richard Ren1, Jason Hausenloy1, Oliver Zhang1, Mantas Mazeika1

Summer Yue**2, Alexandr Wang**2, Dan Hendrycks**1

1Center for AI Safety, 2Scale AI

Authors

Tung NguyenDaron AndersonImad Ali ShahMikhail DoroshenkoAlun Cennyth StokesMobeen MahmoodJaeho LeeOleksandr PokutnyiOleg IskraJessica P. WangRobert GerbiczJohn-Clark LevinSerguei PopovFiona FengSteven Y. FengHaoran ZhaoMichael YuVarun GangalChelsea ZouZihan WangMstyslav KazakovGeoff GalgonJohannes SchmittAlvaro SanchezYongki LeeWill YeadonScott SauersMarc RothChidozie AguSøren RiisFabian GiskaSaiteja UtpalaAntrell CheatomZachary GiboneyGashaw M. GoshuSarah-Jane CrowsonMohinder Maheshbhai NaiyaNoah BurnsLennart FinkeZerui ChengHyunwoo ParkFrancesco Fournier-FacioJennifer ZampeseJohn WydallisJohn B. WydallisRyan G. HoerrMark NandorTim GehrungerJiaqi CaiBen McCartyJungbae NamEdwin TaylorJun JinGautier Abou LoumeHangrui CaoAlexis C GarretsonDamien SileoQiuyu RenDoru CojocPavel ArkhipovUsman QaziAras BachoLianghui LiSumeet MotwaniChristian Schroeder de WittAlexei KopylovJohannes VeithEric SingerPaolo RissoneJaehyeok JinJack Wei Lun ShiChris G. WillcocksAmeya PrabhuLongke TangKevin ZhouEmily de Oliveira SantosAndrey Pupasov MaksimovEdward VendrowKengo ZenitaniJoshua RobinsonAleksandar MikovJulien GuillodYuqi LiBen PagelerJoshua VendrowVladyslav KuchkinPierre MarionDenis EfremovJayson LynchKaiqu LiangAndrew GritsevskiyDakotah MartinezNick CrispinoDimitri ZvonkineNatanael Wildner FragaSaeed SooriOri PressHenry TangJulian SalazarSean R. GreenLina BrüsselMoon TwayanaAymeric DieuleveutT. Ryan RogersWenjin ZhangRoss FinocchioBikun LiJinzhou YangArun RaoGabriel LoiseauMikhail KalininMarco LukasCiprian ManolescuNate StambaughSubrata MishraAriel Ghislain Kemogne KamdoumTad HoggAlvin JinCarlo BosioGongbo SunBrian P CoppolaHaline HeidingerRafael SayousStefan IvanovJoseph M CavanaghJiawei ShenJoseph Marvin ImperialPhilippe SchwallerShaipranesh SenthilkumaAndres M BranAndres AlgabaBrecht VerbekenKelsey Van den HouteLynn Van Der SyptDavid NoeverLisa SchutIlia SucholutskyEvgenii ZheltonozhskiiQiaochu YuanDerek LimRichard StanleyShankar SivarajanTong YangJohn MaarJulian WykowskiMartí OllerJennifer SandlinAnmol SahuCesare Giulio ArditoYuzheng HuFelipe Meneguitti DiasTobias KreimanKaivalya RawalTobias Garcia VilchisYuexuan ZuMartin LacknerJames KoppelJeremy NguyenDaniil S. AntonenkoSteffi ChernBingchen ZhaoPierrot ArseneSergey IvanovRafał PoświataChenguang WangDaofeng LiDonato CrisostomiAli DehghanAndrea AchilleosJohn Arnold AmbayBenjamin MyklebustArchan SenDavid PerrellaNurdin KaparovMark H InlowAllen ZangKalyan RamakrishnanDaniil OrelVladislav PoritskiShalev Ben-DavidZachary BergerParker WhitfillMichael FosterDaniel MunroLinh HoDan Bar HavaAleksey KuchkinRobert LauffDavid HolmesFrank SommerhageAnji ZhangRichard MoatKeith SchneiderDaniel PydaZakayo KazibweMukhwinder SinghDon ClarkeDae Hyun KimSara FishVeit ElserVictor Efren Guadarrama VilchisImmo KloseChristoph DemianUjjwala AnantheswaranAdam ZweigerGuglielmo AlbaniJeffery LiNicolas DaansMaksim RadionovVáclav RozhoňVincent GinisZiqiao MaChristian StumpJacob PlatnickVolodymyr NevirkovetsLuke BaslerMarco PiccardoNiv CohenVirendra SinghJosef TkadlecPaul RosuAlan GoldfarbPiotr PadlewskiStanislaw BarzowskiKyle MontgomeryAline MenezesArkil PatelZixuan WangJamie Tucker-FoltzJack StadeDeclan GrabbTom GoertzenFereshteh KazemiJeremiah MilbauerAbhishek ShuklaHossam ElgnainyYan Carlos Leyva LabradorHao HeLing ZhangAlan GivréHew WolffGözdenur DemirMuhammad Fayez AzizYounesse KaddarIvar ÄngquistYanxu ChenElliott ThornleyRobin ZhangJiayi PanAntonio TerpinNiklas MuennighoffHailey SchoelkopfEric ZhengAvishy CarmiJainam ShahEthan D. L. BrownKelin ZhuMax BartoloRichard WheelerAndrew HoShaul BarkanJiaqi WangMartin StehbergerEgor KretovPeter BradshawJP HeimonenKaustubh SridharZaki HossainIdo AkovYury MakarychevJoanna TamHieu HoangDavid M. CunninghamVladimir GoryachevDemosthenes PatramanisMichael KrauseAndrew RedentiDavid AldousJesyin LaiShannon ColemanJiangnan XuSangwon LeeIlias MagoulasSandy ZhaoNing TangMichael K. CohenMicah CarrollOrr ParadiseJan Hendrik KirchnerStefan SteinerbergerMaksym OvchynnikovJason O. MatosAdithya ShenoyMichael WangYuzhou NiePaolo GiordanoPhilipp PetersenAnna Sztyber-BetleyPaolo FaraboschiRobin RibletJonathan CrozierShiv HalasyamaniAntonella PintoShreyas VermaPrashant JoshiEli MerilZheng-Xin YongAllison TeeJérémy AndréolettiOrion WellerRaghav SinghalGang ZhangAlexander IvanovSeri KhouryNils GustafssonHamid MostaghimiKunvar ThamanQijia ChenTrần Quốc KhánhJacob LoaderStefano CavalleriHannah SzlykZachary BrownHimanshu NarayanJonathan RobertsWilliam AlleyKunyang SunRyan StendallMax LamparthAnka ReuelTing WangHanmeng XuPablo Hernández-CámaraFreddie MartinThomas PreuTomek KorbakMarcus AbramovitchDominic WilliamsonIda BosioZiye ChenBiró BálintEve J. Y. LoMaria Inês S. NunesYibo JiangM Saiful BariPeyman KassaniZihao WangBehzad AnsarinejadYewen SunStephane DurandGuillaume DouvilleDaniel TorderaGeorge BalabanianEarth AndersonLynna KvistadAlejandro José MoyanoHsiaoyun MillironAhmad SakorMurat EronIsaac C. McAlisterAndrew Favre D.O.Shailesh ShahXiaoxiang ZhouFiruz KamalovRonald ClarkSherwin AbdoliTim SantensHarrison K WangEvan ChenAlessandro TomasielloG. Bruno De LucaShi-Zhuo LooiVinh-Kha LeNoam KoltNiels MündlerAvi SemlerEmma RodmanJacob DroriCarl J FossumLuk GloorMilind JagotaRonak PradeepHonglu FanTej ShahJonathan EicherMichael ChenKushal ThamanWilliam MerrillMoritz FirschingCarter HarrisȘtefan CiobâcăJason GrossRohan PandeyIlya GusevAdam JonesShashank AgnihotriPavel ZhelnovSiranut UsawasutsakornMohammadreza MofayeziAlexander PiperskiMarc CarauleanuDavid K. ZhangKostiantyn DobarskyiDylan LerRoman LeventovIgnat SorokoThorben JansenScott CreightonPascal LauerJoshua DuerschVage TaamazyanDario BezziWiktor MorakWenjie MaWilliam HeldTran Đuc HuyRuicheng XianArmel Randy ZebazeMohanad MohamedJulian Noah LeserMichelle X YuanLaila YacarJohannes LenglerKatarzyna OlszewskaHossein ShahrtashEdson OliveiraJoseph W. JacksonDaniel Espinosa GonzalezAndy ZouMuthu ChidambaramTimothy ManikHector HaffendenDashiell StanderAli DasouqiAlexander ShenEmilien DucBita GolshaniDavid StapMikalai UzhouAlina Borisovna ZhidkovskayaLukas LewarkMiguel Orbegozo RodriguezMátyás VinczeDustin WehrColin TangShaun PhillipsFortuna SamueleJiang MuzhenFredrik EkströmAngela HammonOam PatelFaraz FarhidiGeorge MedleyForough MohammadzadehMadellene PeñaflorHaile KassahunAlena FriedrichClaire SparrowRayner Hernandez PerezTaom SakalOmkar DhamaneAli Khajegili MirabadiEric HallmanKenchi OkutsuMike BattagliaMohammad MaghsoudimehrabaniAlon AmitDave HulbertRoberto PereiraSimon WeberHandokoAnton PeristyyStephen MalinaSamuel AlbanieWill CaiMustafa MehkaryRami AlyFrank ReidegeldAnna-Katharina DickCary FridayJasdeep SidhuHassan ShapourianWanyoung KimMariana CostaHubeyb GurdoganBrian WeberHarsh KumarTong JiangArunim AgarwalChiara CeconelloWarren S. VazChao ZhuangHaon ParkAndrew R. TawfeekDaattavya AggarwalMichael KirchhofLinjie DaiEvan KimJohan FerretYuzhou WangMinghao YanKrzysztof BurdzyLixin ZhangAntonio FrancaDiana T. PhamKang Yong LohJoshua RobinsonAbram JacksonShreen GulGunjan ChhablaniZhehang DuAdrian CosmaJesus ColinoColin WhiteJacob VotavaVladimir VinnikovEthan DelaneyPetr SpeldaVit StriteckySyed M. ShahidJean-Christophe MourratLavr VetoshkinKoen SponseleeRenas BachoFlorencia de la RosaXiuyu LiGuillaume MalodLeon LangJulien LaurendeauDmitry KazakovFatimah AdesanyaJulien PortierLawrence HollomVictor SouzaYuchen Anna ZhouJulien DegorreYiğit YalınGbenga Daniel ObikoyaLuca ArnaboldiRai (Michael Pokorny)Filippo BigiM.C. BoscáOleg ShumarKaniuar BachoPierre ClavierGabriel RecchiaMara PopescuNikita ShulgaNgefor Mildred TanwieDenis PeskoffThomas C.H. LuxBen RankColin NiMatthew BrooksAlesia YakimchykHuanxu (Quinn) LiuOlle HäggströmEmil VerkamaHans GundlachLeonor Brito-SantanaBrian AmaroVivek VajipeyRynaa GroverYiyang FanGabriel Poesia Reis e SilvaLinwei XinYosi KratishJakub ŁuckiWen-Ding LiSivakanth GopiAndrea CaciolaiJustin XuKevin Joseph ScariaFreddie VargusFarzad HabibiLong (Tony) LianEmanuele RodolàJules RobinsVincent ChengTony FruhauffBrad RaynorHao QiXi JiangBen SegevJingxuan FanSarah MartinsonErik Y. WangKaylie HausknechtMichael P. BrennerMao MaoXinyu ZhangDavid AvagianEshawn Jessica ScipioAlon RagolerJustin TanBlake SimsRebeka PlecnikAaron KirtlandOmer Faruk BodurD.P. ShindeZahra AdoulMohamed ZekryAli KarakocTania C. B. SantosSamir ShamseldeenLoukmane KarimAnna LiakhovitskaiaNate ResmanNicholas FarinaJuan Carlos GonzalezGabe MaayanSarah HobackRodrigo De Oliveira PenaGlen ShermanElizabeth KelleyHodjat MarijiRasoul PouriamaneshWentao WuSandra MendozaIsmail AlarabJoshua ColeDanyelle FerreiraBryan JohnsonMohammad SafdariLiangti DaiSiriphan ArthornthurasukAlexey ProninJing FanAngel Ramirez-TrinidadAshley CartwrightDaphiny PottmaierOmid TaheriDavid OutevskyStanley StepanicSamuel PerryLuke AskewRaúl Adrián Huerta RodríguezAli M. R. MinissiSam AliRicardo LorenaKrishnamurthy IyerArshad Anil FasiludeenSk Md SalauddinMurat IslamJuan GonzalezJosh DuceyMaja SomrakVasilios MavroudisEric VergoJuehang QinBenjámin BorbásEric ChuJack LindseyAnil RadhakrishnanAntoine JallonI.M.J. McInnisPawan KumarLaxman Prasad GoswamiDaniel BugasNasser HeydariFerenc JeanplongArchimedes AprontiAbdallah GalalNg Ze-AnAnkit SinghJoan of Arc XavierKanu Priya AgarwalMohammed BerkaniBenedito Alves de Oliveira JuniorDmitry MalishevNicolas RemyTaylor D. HartmanTim TarverStephen MensahJavier GimenezRoselynn Grace MontecilloRussell CampbellAsankhaya SharmaKhalida MeerXavier AlapontDeepakkumar PatilRajat MaheshwariAbdelkader DendanePriti ShuklaSergei BogdanovSören MöllerMuhammad Rehan SiddiqiPrajvi SaxenaHimanshu GuptaInnocent EnyekweRagavendran P VZienab EL-WasifAleksandr MaksapetyanVivien RossbachChris HarjadiMohsen BahaloohorehSong BianJohn LaiJustine Leon UroGreg BatemanMohamed SayedAhmed MenshawyDarling DucloselYashaswini JainAshley AaronMurat TiryakiogluSheeshram SiddhKeith KrenekAlex HooverJoseph McGowanTejal Patwardhan

Affiliations

Introduction

Benchmarks are important tools for tracking the rapid advancements in large language model (LLM) capabilities. However, benchmarks are not keeping pace in difficulty: LLMs now achieve over 90% accuracy on popular benchmarks like MMLU, limiting informed measurement of state-of-the-art LLM capabilities. In response, we introduce Humanity's Last Exam, a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. The dataset consists of 2,700 challenging questions across over a hundred subjects. We publicly release these questions, while maintaining a private test set of held out questions to assess model overfitting.

Difficulty comparison across benchmarks

Compared against the saturation of some existing benchmarks, Humanity's Last Exam accuracy remains low across several frontier models, demonstrating its effectiveness for measuring advanced, closed-ended, academic capabilities.

Dataset

Humanity's Last Exam (HLE) is a global collaborative effort, with questions from nearly 1,000 subject expert contributors affiliated with over 500 institutions across 50 countries – comprised mostly of professors, researchers, and graduate degree holders.

Examples 1-2/8

Classics

Question:

Question image

Here is a representation of a Roman inscription, originally found on a tombstone. Provide a translation for the Palmyrene script.
A transliteration of the text is provided: RGYNᵓ BT ḤRY BR ᶜTᵓ ḤBL

Henry T

Merton College, Oxford

Ecology

Question:

Hummingbirds within Apodiformes uniquely have a bilaterally paired oval bone, a sesamoid embedded in the caudolateral portion of the expanded, cruciate aponeurosis of insertion of m. depressor caudae. How many paired tendons are supported by this sesamoid bone? Answer with a number.

Edward V

Massachusetts Institute of Technology

Samples of the diverse and challenging questions submitted to Humanity's Last Exam.

Quantitative Results

Accuracy. All frontier models achieve low accuracy on Humanity's Last Exam, highlighting significant room for improvement in narrowing the gap between current LLMs and expert-level academic capabilities on closed-ended questions.

Calibration Error. Given low performance on Humanity's Last Exam, models should be calibrated, recognizing their uncertainty rather than confidently provide incorrect answers, indicative of confabulation/hallucination. To measure calibration, we prompt models to provide both an answer and their confidence from 0% to 100%.

Judge Model: o3-mini-2025-01-31 | Last Updated: 02/11/2025
ModelAccuracy (%) ↑Calibration Error (%) ↓
GPT-4o logoGPT-4o3.192.3
Grok-2 logoGrok-23.990.8
Claude 3.5 Sonnet logoClaude 3.5 Sonnet4.888.5
Gemini Thinking logoGemini Thinking7.290.6
o1 logoo18.892.8
DeepSeek-R1* logoDeepSeek-R1*8.681.4
o3-mini (medium)* logoo3-mini (medium)*11.191.5
o3-mini (high)* logoo3-mini (high)*14.092.8

*Model is not multi-modal, evaluated on text-only subset.

Discussion

Future Model Performance

While current LLMs achieve very low accuracy on Humanity's Last Exam, recent history shows benchmarks are quickly saturated -- with models dramatically progressing from near-zero to near-perfect performance in a short timeframe. Given the rapid pace of AI development, it is plausible that models could exceed 50% accuracy on HLE by the end of 2025. High accuracy on HLE would demonstrate expert-level performance on closed-ended, verifiable questions and cutting-edge scientific knowledge, but it would not alone suggest autonomous research capabilities or "artificial general intelligence." HLE tests structured academic problems rather than open-ended research or creative problem-solving abilities, making it a focused measure of technical knowledge and reasoning. HLE may be the last academic exam we need to give to models, but it is far from the last benchmark for AI.

Impact

By providing a clear measure of AI progress, Humanity's Last Exam creates a common reference point for scientists and policymakers to assess AI capabilities. This enables more informed discussions about development trajectories, potential risks, and necessary governance measures.

Citation

For any inquiries or feedback, please contact us at agibenchmark@safe.ai
Submit feedback to questions in the dataset via this form