Humanity's Last Exam: Can AI Pass the Ultimate Test? πŸ“ (Pt 1)

OpenAI Operator, Perplexity Assistant, and AI takes a 3000 question exam

Hey there, AI enthusiasts!

Remember those sci-fi movies where people just tell their computers what to do, and it magically happens? Well, we're getting closer than ever with the latest AI assistants. These aren't just your basic chatbots; they can actually do things, navigating the web and taking actions on your behalf.

Imp News πŸ—žοΈ

We, at Findr, are nominated for No.1 Productivity Tool of The Year! Your support means the πŸŒβ€”please vote for us!

Please vote for us here: https://www.producthunt.com/golden-kitty-awards/personal-productivity-2

β€” Nishkarsh (CEO, Findr)

OpenAI's Operator: The Agent That Does It All πŸ“Ά

OpenAI, the company behind ChatGPT, just dropped a bombshell with "Operator." This AI agent uses a web browser just like a human, clicking buttons, filling out forms, and scrolling through pages. Need to book a dinner reservation? Groceries running low? Need to find an in-network dentist (because adulting is hard)? Operator can do that!

You can even watch Operator work in real-time, sharing its screen as it completes tasks. It's like magic, except it's real, and it's powered by some seriously impressive AI.

Operator already integrates with major platforms like DoorDash, Instacart, OpenTable, Uber, and StubHub, making it a one-stop shop for all your everyday needs. And the best part? It's personalized. You can give Operator custom instructions for different websites, save prompts for recurring tasks, and even take back control of the screen whenever you want for sensitive stuff like payments or logins.

Perplexity Assistant: The Free Agent for Android πŸ€―

OpenAI isn't the only one making waves in the AI assistant space. Perplexity just launched its own Perplexity Assistant, and it's taking aim at the mobile world. This free Android app integrates directly with your phone, allowing you to control apps and perform complex tasks with voice commands or gestures.

Need an Uber? Just ask Perplexity Assistant. Want to play a YouTube video? It can handle that too. It can even use your phone's camera to answer questions about your surroundings. And since it's powered by both ChatGPT and Claude's tech, you know it's got some serious brains behind it.

The key difference between Perplexity Assistant and OpenAI's Operator? Price. Perplexity's offering is completely free for all Android users, while Operator is currently only available to ChatGPT Pro users for a cool $200/month.

Humanity's Last Exam: Can AI Pass the Ultimate Test? πŸ“

While AI assistants are busy taking over our to-do lists, another group of researchers is focused on pushing the boundaries of AI intelligence. The Center for AI Safety and Scale AI just introduced "Humanity's Last Exam," a new benchmark designed to be the ultimate test of an AI's academic knowledge.

This isn't your average AI test. It consists of 3,000 expert-crafted questions across over 100 subjects, ranging from analytic philosophy to rocket engineering. Even the most advanced AI models are struggling to pass, with current leading systems scoring under 10% accuracy.

But why is this test so important? As AI systems become increasingly powerful, we need new ways to measure their capabilities. Humanity's Last Exam is designed to be the final frontier, a test that will truly challenge AI and help us understand its limits.

Other AI News: From Data Centers to Deepfakes

The AI world is always buzzing with activity, and this week was no exception. Here are a few other noteworthy developments:

  • Reliance's Massive Data Center: Reliance, the Indian conglomerate, announced plans to build the "world's biggest" data center in India. This massive project is expected to cost upwards of $20 billion and will further solidify India's position as a major player in the AI landscape.

  • Samsung's AI Innovations: Samsung unveiled its latest AI features at Galaxy Unpacked 2025, including Content Credentials, which allows users to identify AI-generated images. This is a crucial step towards combating deepfakes and ensuring transparency in digital content.

  • ByteDance's Agent-R: ByteDance, the company behind TikTok, introduced Agent-R, a new framework that teaches AI agents to self-reflect and correct mistakes in real-time. This could lead to more reliable and efficient AI systems that can learn and adapt on their own.

  • Open-Source Framework for AI Reasoning: A new open-source framework is making AI reasoning more efficient, reducing the number of tokens needed to solve complex problems. This could lead to faster and more cost-effective AI systems.

  • Microsoft and OpenAI's Evolving Partnership: Microsoft and OpenAI reaffirmed their strategic partnership, with Microsoft emphasizing its exclusive rights to OpenAI's IP and APIs. However, Microsoft is no longer OpenAI's exclusive cloud provider, opening the door for OpenAI to work with other companies like Oracle.

Once again, your support means a lot β€” please vote for us to become the best productivity tool πŸ™: https://www.producthunt.com/golden-kitty-awards/personal-productivity-2

β€” Nishkarsh (CEO, Findr)

The AI Digest: Your Weekly Dose of Latest AI Updates

Want to stay ahead of the curve in this rapidly evolving AI landscape? We've got you covered. "The AI Digest by Findr" is your weekly dose of AI awesomeness, bringing you the latest breakthroughs, trends, and controversies. So buckle up and get ready for the ride – the future of AI is here!

Stay curious,
The AI Digest by Findr