OpenAI’s GPT-4 is Here. It’s Passing More Exams

The new model of OpenAI's ChatGPT landed in the 90th percentile on the Bar Exam.
By
portrait of Evan Castillo
Evan Castillo
Read Full Bio

Editor & Writer

Evan Castillo is a reporter on BestColleges News and wrote for the Daily Tar Heel during his time at the University of North Carolina at Chapel Hill. He's covered topics ranging from climate change to general higher education news, and he is passiona...
Published on March 16, 2023
Edited by
portrait of Alex Pasquariello
Alex Pasquariello
Read Full Bio

Editor & Writer

Alex Pasquariello is a senior news editor for BestColleges. Prior to joining BestColleges he led Metropolitan State University of Denver's digital journalism initiative. He holds a BS in journalism from Northwestern University....
and
portrait of Raneem Taleb-Agha
Raneem Taleb-Agha
Read Full Bio

Editor

Raneem Taleb-Agha is a copy editor for BestColleges. Previously, she worked as a bilingual educator in both the U.S. and Spain before transitioning to editing and writing. She holds a BA in Spanish and Near Eastern Studies from UC Berkeley....
Learn more about our editorial process
Image Credit: CFOTO / Future Publishing / Getty Images

  • The new model of OpenAI's ChatGPT can pass more exams with higher scores, accurately process images, and adopt different personalities.
  • The model still makes simple factual and logical reasoning errors.
  • The model is available to some developers and researchers. Users must sign up for the waitlist for access.

OpenAI's ChatGPT-3.5 artificial intelligence has proved to be a passable student for law, medical, and college-level exams. But OpenAI's new model, GPT-4, looks like it wants to be the top student in the class.

On March 14, OpenAI released a technical report of GPT-4, the newest iteration of the ChatGPT artificial intelligence showcasing the model's capabilities and limitations. College students, professors and administrators take note: This version of the AI chatbot improves academic performance, tunes AI personalities, and even shows the ability to assess images.

However, the AI can still make simple logical and factual mistakes.

Here's a first look at how GPT-4 performed on college- and graduate-level exams and other benchmarks.

What Exams Can GPT-4 Pass?

One of GPT-4's biggest accomplishments is becoming a licensed practitioner of law.

GPT-4 shot beyond GPT-3.5's performance in the Uniform Bar Exam with a 298/400, landing in the 90th percentile of students. GPT-3.5's test score was 213/400, in the 10th percentile of students.



GPT-3.5 vs. GPT-4 Exam Performance

Chevron Down

Uniform Bar Exam (MBE+MEE+MPT)

  • GPT-4 Score: 298/400 (90th percentile)
  • GPT-3.5 Score: 213/400 (10th percentile)

Law School Admission Test (LSAT)

  • GPT-4 Score: 163/180 (88th percentile)
  • GPT-3.5 Score: 149/180 (40th percentile)

Scholastic Assessment Test (SAT) Evidence-Based Reading & Writing

  • GPT-4 Score: 710/800 (93rd percentile)
  • GPT-3.5 Score: 670/800 (87th percentile)

Scholastic Assessment Test (SAT) Math

  • GPT-4 Score: 700/800 (89th percentile)
  • GPT-3.5 Score: 590/800 (70th percentile)

Graduate Record Examination (GRE) Quantative

  • GPT-4 Score: 163/170 (80th percentile)
  • GPT-3.5 Score: 147/170 (25th percentile)

Graduate Record Examination (GRE) Verbal

  • GPT-4 Score: 169/170 (99th percentile)
  • GPT-3.5 Score: 154/170 (63rd percentile)

Graduate Record Examination (GRE) Writing

  • GPT-4 Score: 4/6 (54th percentile)
  • GPT-3.5 Score: 4/6 (54th percentile)

The USA Biology Olympiad (USABO) Semifinal Exam 2020

  • GPT-4 Score: 87/150 (99th-100th percentile)
  • GPT-3.5 Score: 43/150 (31st-33rd percentile)

The U.S. National Chemistry Olympiad (USNCO) Local Section Exam 2022

  • GPT-4 Score: 36/60
  • GPT-3.5 Score: 24/60

Medical Knowledge Self-Assessment Program

  • GPT-4 Score: 75%
  • GPT-3.5 Score: 53%

Codeforces Rating

  • GPT-4 Score: 392 (below 5th percentile)
  • GPT-3.5 Score: 260 (below 5th percentile)

AP Art History

  • GPT-4 Score: 5 (86th-100th percentile)
  • GPT-3.5 Score: 5 (86th-100th percentile)

AP Biology

  • GPT-4 Score: 5 (85th-100th percentile)
  • GPT-3.5 Score: 4 (62nd-85th percentile)

AP Calculus BC

  • GPT-4 Score: 4 (43rd-59th percentile)
  • GPT-3.5 Score: 1 (0th-7th percentile)

AP Chemistry

  • GPT-4 Score: 4 (71st-88th percentile)
  • GPT-3.5 Score: 2 (22nd-46th percentile)

AP English Language and Composition

  • GPT-4 Score: 2 (14th-44th percentile)
  • GPT-3.5 Score: 2 (14th-44th percentile)

AP English Literature and Composition

  • GPT-4 Score: 2 (8th-22nd percentile)
  • GPT-3.5 Score: 2 (8th-22nd percentile)

AP Environmental Science

  • GPT-4 Score: 5 (91sth-100th percentile)
  • GPT-3.5 Score: 5 (91st-100th percentile)

AP Macroeconomics

  • GPT-4 Score: 5 (84th-100th percentile)
  • GPT-3.5 Score: 2 (33rd-48th percentile)

AP Microeconomics

  • GPT-4 Score: 5 (82nd-100th percentile)
  • GPT-3.5 Score: 4 (60th-82nd percentile)

AP Physics 2

  • GPT-4 Score: 4 (66th-84th percentile)
  • GPT-3.5 Score: 3 (30th-66th percentile)

AP Psychology

  • GPT-4 Score: 5 (83rd-100th percentile)
  • GPT-3.5 Score: 5 (83rd-100th percentile)

AP Statistics

  • GPT-4 Score: 5 (85th-100th percentile)
  • GPT-3.5 Score: 3 (40th-63rd percentile)

AP U.S. Government

  • GPT-4 Score: 5 (88th-100th percentile)
  • GPT-3.5 Score: 4 (77th-88th percentile)

AP U.S. History

  • GPT-4 Score: 5 (89th-100th percentile)
  • GPT-3.5 Score: 4 (74th-89th percentile)

AP World History

  • GPT-4 Score: 5 (89th-100th percentile)
  • GPT-3.5 Score: 4 (74th-89th percentile)

Image Processing

One of the biggest differences between GPT-3.5 and GPT-4 is the AI's ability to accurately see and assess images. Previously, a study testing GPT-3.5 on the United States Medical Exam removed all questions containing visual assets due to the model's inability to determine what was in an image.

OpenAI submitted a combination of text and images to ask the AI, "What's funny about this image? Describe it panel by panel."

From OpenAI's GPT-4 Technical Report.

Steerability

Developers, and later users, can change the AI's "character" to be different from the usual style of ChatGPT. For example, students can now change GPT-4 into a Socratic tutor that will never give students the answer but guide them through problem-solving.

Or they can turn the AI into a Shakespearean pirate.

Limitations

GPT-4, like its predecessors, can still "hallucinate" facts and make reasoning errors. The base model is slightly better than GPT-3.5. The gap widens after Reinforcement Learning from Human Feedback (RLHF) training.

Like GPT-3.5, GPT-4's brain is stuck in the past. It generally lacks knowledge of any event after September 2021.

GPT-4 is only available to some developers and researchers, but you can join OpenAI's waitlist. Text-only requests are currently available, and pricing is $.03 per 1k prompt tokens and $.06 per 1k completion tokens.

"We look forward to GPT-4 becoming a valuable tool in improving people's lives by powering many applications," OpenAI said. "There's still a lot of work to do, and we look forward to improving this model through the collective efforts of the community building on top of, exploring, and contributing to the model."