Late-Night Robotics: Why We’re Building the Real World Web
A Busy Week on Robotics Twitter
We did this update on a Sunday midnight slot in Hong Kong, because the usual Friday session slipped. The community bullied Nils into wearing glasses on camera (“for your protection and mine”) and we kicked off with a quick tour of what happened online this week:
All of that framed the main question of the update: what are we actually building, and why?
Why Physical AI Is the “Final Frontier”
We walked through a familiar slide that borrows the shape of Jensen Huang’s CES graph: From generative AI to agentic AI to the next and “final” frontier: physical AI – AI that understands space and physics and acts in the real world.
The key number we keep coming back to: “70% of the world’s GDP is still tied to physical locations and labor.”
So moving from agentic to physical AI isn’t a small step. It 3x’s the TAM of AI at minimum, and probably more once robots make entirely new things economically viable. If you want to build something bigger than OpenAI, you have to play here. That’s why we try to make the physical world accessible to AI and robots.
Our answer to that is the real world web.
The Real World Web in One Paragraph
We used to call it a “decentralized machine perception network.” Same thing, new name.
Makes the physical world navigable, searchable, and accessible to AI.
Is a collaboratively editable collection of 3D domains, not one central map.
Runs on a network of nodes you can operate, not just our own servers.
An analogy we like: “The internet took all our digital information and made it searchable and browsable to humans. Now we need to turn the internet upside down and make the physical world browsable to robots.”
We don’t try to do everything. In Nils's words: “We don’t do any serious work on locomotion and manipulation in our lab… The Auki network is collaborative perception, collaborative mapping, and collaborative positioning.”
We also showed Deep Robotics’ DeepVLA demo: a quadruped doing end-to-end navigation using an internal map. Today that map is hard-coded inside the robot. The real world web is about making those maps fetchable on demand when the robot shows up in a new place.
From Phones to Glasses to Robots
Our core insight back in 2021: “An iPhone is actually just a robot with no arms and legs. A pair of AR glasses is really a robot with no arms and legs.”
So we attacked the stack in this order:
1. Handheld copilots: Cactus for retail
We launched Cactus, our retail copilot that runs on the phone already in your pocket:
Build a hyper-accurate digital twin of the store.
Combine it with sales data to generate heat maps of shelf performance.
Provide AR navigation for staff and shoppers.
Feed that same spatial data to robots later.
Cactus is already:
Live or rolling out in 1,000+ locations
Doing millions in pilot revenue
Sitting on an open pipe of $150m+
2. Glasses: Mentra + OneShot
Next we plugged Mentra’s open-source smart glasses into the network.
Two examples:
OneShot (Mika’s grant project)
Glasses + checklist + AI copilot for physical work.
Helps a new worker assemble food correctly, step by step.
Built on our on-prem, privacy-friendly perception stack so enterprises don’t have to ship video to some random cloud.
Empty-shelf workflow
Glasses spot an empty shelf.
The network captures the precise 3D location.
We can guide a first-day colleague (or a robot) to that exact spot to fix it.
Nils added, "The real world web makes it possible for humans and robots to collaborate in very interesting ways… to our knowledge, we’re the only company that can do these kinds of cross-device spatial understanding today.”
3. Third-party apps: Zappar for blind navigation
We also showed Zappar’s work:
They invented accessible QR codes, already on billions of packages.
Those codes let blind users get product info (ingredients, allergens, etc.).
With our network, they can now add app-free indoor navigation:
Connect a store domain to the real world web.
Let Zappar guide blind users to the products, not just describe them.
Once a venue is on the network (for example via Cactus), apps like Zappar can plug in and upsell new capabilities without re-mapping the world from scratch.
Why We Use a Token at All
We ended by revisiting the token design and why we grudgingly embraced Web3.
Nils started firmly not wanting to be a crypto project. But we hit two hard problems:
SLAs without owning the hardware
We want to give developers uptime guarantees for a network mostly running on other people’s machines.
The early idea: cash deposits from node operators we could slash if they misbehaved.
Lawyers came back with: enjoy a global licensing nightmare (in some places this basically requires a banking license).
Counter-intuitively, it’s more compliant in many jurisdictions to hold a crypto deposit than a fiat one.
Cross-border micro-payments
We need to bill fractions of a cent for data usage across borders.
Stripe/PayPal-style rails are simply not built for that.
So we ended up with a burn-credit-mint model:
Developers:
Buy tokens on an exchange when they need to send data.
Burn a dollar-denominated amount of tokens (e.g., $100 worth) to receive credits.
Spend credits for traffic on the network. They don’t have to care about token volatility.
When a burn happens, we mint fewer tokens than were burned into a deflationary pool.
That pool is distributed pro rata based on where credits were spent.
They get paid in tokens and do care about token price, which incentivizes them to run reliable infrastructure.
We started at 10 billion $AUKI tokens. As the supply shrinks towards 5 billion, the system gradually shifts from deflationary to a 1:1 burn/mint model.
The goal is to let the network self-provision: if there’s profit to be made running a certain kind of node, more people spin them up; if not, capacity shrinks.
As Nils said: “We believe in the fullness of time that Auki will become the largest decentralized infrastructure network, becoming the decentralized nervous system for physical AI in the world.”