OpenAI's cleanup of the model selection dropdown

With the launch of GPT-5, it’s interesting to see how OpenAI cleaned up the “model selection dropdown” which is something I and many others have written about.

It went from this:

To this:

Most importantly, they took a big step towards abstracting away model selection entirely. Now, GPT-5 decides how much to use reasoning, and with some guidance from the user, what other tools to access.

It is interesting that they left the dropdown there at all as it’s hard to imagine many people will proactively select “GPT-5 Thinking” given this is already embedded in the flagship model. My guess is there’s some contingent that wants to hold on to this “container feature” perhaps as a way to launch new model versions and features in beta. Or potentially, it’s a temporary test to see if people still select the Thinking version of the model now that the Flagship model can access this automatically.

In my initial experiments with the Flagship model, it does a pretty good job of gauging how much reasoning is needed based on the prompt, so I suspect this design element will dissapear entirely in the next few months as very few people actively select the Thinking model.

Build Products Like You Play Wordle

No, I don’t mean texting your college roommates and your aunt daily PRDs with a string of green and yellow boxes.

But I’ve found that playing Wordle is a metaphor for effective product leadership that holds up in a shocking number of dimensions. Wordle isn’t about guessing blindly, it’s about using limited information to make increasingly informed decisions. Product development works the same way. Each release, each experiment, each customer conversation gives you a little more signal. And like Wordle, you’re rarely told outright when you’re on the right track, you have to interpret the clues and adjust accordingly.

One of the core tensions in product leadership is the need to provide vision and certainty in an environment that’s fundamentally uncertain. Great product leaders project confidence, but behind the scenes they’re constantly adapting. They’re running experiments, shifting priorities, and testing hypotheses, all while steering the ship forward. Playing Wordle while giving a TED talk, basically.

And just like in Wordle, there are ways to play smarter.

Use past attempts to inform your next move. In Wordle, you don’t waste guesses repeating letters that have already proven wrong. In product, the feedback loop is messier, but no less valuable. Maybe a feature flopped. Maybe a launch underperformed. That doesn’t always mean the core idea was bad. Timing, positioning, implementation, or the user segment could have been off. Great teams don’t just record what failed, they interrogate why it failed.

Recognize patterns. You don’t need a linguistics degree to win at Wordle, but understanding prefixes, suffixes, and common word structures helps a lot. In product, it’s the same. Over time, you develop an intuition for what tends to work: conversion patterns, pricing models, engagement hooks. Pattern recognition isn’t a shortcut to certainty, but it tilts the odds in your favor.

Run high-signal experiments. In Wordle, “PEPPER” is not a good first guess, and generally (especially early on) it’s best not to waste guesses on words with repeating letters. Instead, you should try combinations that reveal the most. “PEPPER” only has three different letters, “TRASH” has six different letters including commonly used R, S, and T. The same principal applies for product: the best teams prioritize experiments that will teach them the most, not just the ones that seem most likely to succeed. You’re not just shipping features, you’re generating information. As Brian Armstrong put it, “Action produces information.” You can’t learn from an idea that stays in a doc, and you won’t definitively know if you’re right or wrong by having another meeting.

Take real action. Which brings us to possibly the most salient point. You only get feedback in Wordle when you make an actual guess. All the theorizing in the world won’t tell you if “CRANE” is the right word until you submit it. In product, the only way to learn is by shipping - by putting something in front of real users and seeing how they respond. You don’t find product-market fit by holding meetings. You find it by making moves.

Now, this metaphor isn’t perfect, but the differences can make the metaphor even more instructive. In Wordle, you get six tries. In product, there’s no set number of turns, but there is a clock. You’re racing against competitors, market shifts, and resource constraints. That’s why speed matters. The more “words” (aka real-world product attempts) you put out, the more chances you have to converge on the right answer. Waiting for perfect information just means someone else gets there first.

Each iteration gives you a bit more clarity, but never the full picture. So your job as a product leader is to make the smartest possible move with the data you have, then move again. Use common letters. Don’t repeat obvious mistakes. Don’t guess blindly, but don’t freeze either.

And whatever you do, don’t start a group chat to endlessly debate what word to guess next. Make your guess based on the information you have, explain why you made it, learn, and repeat. That’s product leadership.

Product Thinking: ChatGPT’s Model Selection Dropdown

OpenAI’s predicament as “The accidental consumer tech company,” makes them a fertile ground for product critique. Their now flagship product, ChatGPT was launched 7 years after the founding of the company, which means there was a lot built (product and operations) before the company really knew what it was all about.

According to accounts, ChatGPT was initially a proof of concept that caught lightning in a bottle, and is now setting land speed records in the form of user adoption. This leaves various bits and pieces of the product that haven’t caught up with its position in the market. 

One of my favorite examples is the Model Selection Dropdown or MSD (hopefully they have a better internal name). It’s perhaps the ChatGPT feature that best illustrates OpenAI’s split personality between research lab and consumer company.

As a thought experiment, what if OpenAI only ever built models and public APIs? In this world, some other company was the first to figure out the chat interface, and gain the word of mouth virality of ChatGPT. I think it’s fair to say that in this scenario, the company that cracked consumer adoption would have abstracted away the model(s) underneath. And of course, at its launch, ChatGPT didn’t have model selection either, because there was only one model.

Tangent 1

Incidentally, this is a realistic hypothetical. OpenAI made versions of the GPT model that powered the initial version of ChatGPT available through API in March 2022 which is a full 9 months before the launch of ChatGPT itself. 

Just think of all the new AI stuff that comes out in 9 months these days and think about how a breakthrough LLM was just sitting there waiting to be productized.

There were some additional improvements that OpenAI kept for themselves for the ChatGPT launch, which was a fine tuned version of GPT-3.5, but there was certainly enough there between March and November 2022 for someone else to have figured it out. 

I can say this confidently because it was the time that my startup decided to go all in on AI powered features. As an example, we built FeedbackAI which helped managers write thoughtful employee feedback based on just a few adjectives. Others, like Jasper.ai had more success, and went from X to Y insanely fast. But no one figured out the generalized chatbot interface, or at least not anyone that found distribution. It’s fair to say that at this point in the company’s history, OpenAI didn’t have a huge leg up on the distribution front, so that doesn’t fully explain the oversight by everyone else. 

Let’s look more closely at the feature.

What is it?

The Model Selection Dropdown or MSD is a highly prominent feature in the ChatGPT desktop and webapp that allows the user to select which of OpenAI’s proprietary models to use in a chat.

Notably, it doesn’t appear in the logged out version of the webapp:


Product thinking

OpenAI has famously struggled with naming and Product Marketing in general. Sam Altman, has said that at OpenAI “product decisions are downstream of research decisions,” so at least historically there haven’t been the usual feedback loops back from the customer through product to the engineering org. I’ve heard Kevin Weil, the new-ish CPO at OpenAI indicate in recent interviews that this is starting to change, but this seems like a new muscle which isn’t surprising given OpenAI’s split personality between research lab and incidental consumer product company.

After the success of ChatGPT, the research function of OpenAI continued to crank out new model versions. Given the center of gravity of the company and the primordial state of other functions at the time, my assumption is that the MSD was viewed as a “container feature.” That is, it would allow the company to launch new models as they came out behind the dropdown, without disrupting the user experience as it existed, and risking the intense product market fit by introducing unintended consequences.

There’s an interesting clue in how o3 is positioned in the MSD. The label in the app says “uses advanced reasoning” which is an engineering centric view of the feature. Increasing test-time compute, ie “reasoning” was certain a huge technical innovation, but I’m not sure how an average user would know when to use “advanced reasoning” vs 4o which is labeled as “great for most tasks.”

The MSD also provides an easy way to handle feature gating for pricing purposes. Of course not all feature gating is done behind the MSD, but users who want every new model iteration and can appreciate the finer technical points, such as “advanced reasoning” can pay more to see these features appear in their interface.

Going back to our thought experiment, while a consumer-first company would likely have abstracted away the model(s) used behind the scenes, the MSD is a more tangible way to handle price discrimination vs just saying “if you pay more, you get a smarter chatbot, trust us.”

Tangent 2

My guess is there was (and perhaps still is) quite a bit of internal debate about how to package model improvements. I could see engineering caring most about breakthroughs that are technical in nature, while viewing smaller improvements that might be more “marketable” as trivial. I imagine this would put other functions with less organizational clout in a tough position, not wanting to be perceived as slowing things down, but also wanting to provide some structure in terms of packaging and pricing. 

Certainly the hire of Kevin Weil as CPO and the even more recent hire of Fidji Simo as CEO of Applications, will help to balance this out. I also wonder if there is a yet to be hired Marketing oriented executive also required. I can only imagine the challenge that person would have in building the conviction (and internal credibility) needed to wrangle the hot potato of product packaging.

Opportunities and Ideas

  1. User-centric labeling of the models. The people who care about the technical details will consume that content elsewhere.

  2. Add a “let ChatGPT choose” option to the MSD. Good way to experiment with abstracting away the selection. When users select this, do they every switch back.

    • There could still be visual cues on the product that show which model is being used

    • Similar to “searching the web” could say “using o3 advanced reasoning”

  3. Make “let ChatGPT choose” the default in the free version with caps on power features. This would give free users an opportunity to experience the benefits of upgrading, while still maintaining price discrimination between tiers.

    • Have an option to turn on or off premium features so free users could ration their usage to more important queries. Ultimately, they won’t want to do this

    • Ask mid-query if the user wants to use some of their premium credits, ie “this question could benefit from deeper reasoning, use X credits?”

Final Tangent

Incidentally, Cursor uses a similar paradigm to my recommendation.

Cursor is a highly intentional product company, so while I think a similar approach makes sense for OpenAI, Cursor certainly has additional considerations. A few which come to mind:

  1. Cursor’s highly technical users would demand flexibility. Engineers famously debate the tradeoffs between other technical decisions like Ruby vs Python, PostgreSQL vs NOSQL, etc so of course similar tribalism will emerge with models. Having a highly visible choice mechanism allows Cursor to satisfy all tribes. 

  2. They need a mechanism for enterprise customers to control model usage and allow for the use of in-house private models.

  3. They want to play nice with all model companies. Cursor is better off in a world where there are lots of competitive models, and they get to be the gatekeeper to users.