
As physical AI and its embodiment, robots, move from labs into real world environments, a new generation of applications is emerging: real world web apps that understand the physical world and synthesize physical and digital realities. To kick off the new year, we want to highlight projects that are building on the Auki Network to bring spatially aware AI into the physical world. Oneshot is the first project in this ecosystem spotlight series.
Oneshot gives places memory. It stores procedural knowledge locally so people can learn, execute, and improve work over time. Built on Auki’s distributed compute and spatial infrastructure, Oneshot turns locations like pizzerias or ships into self-improving systems where knowledge compounds instead of disappears.
We sat down with Mika Haak, founder of Oneshot, to learn what storing the memory of a place entails and why it’s important.
Oneshot is the AI copilot for physical work that follows your procedures. It combines smart glasses with computer vision to detect deviations instantly and gives real-time feedback. Instead of treating AI memory as something abstract or cloud-based, Oneshot anchors knowledge, like how to perform a task or follow a workflow, to a specific location.
At its core, Oneshot captures procedures, not simply raw visual feeds. Using the visual data provided by users wearing smart glasses, Oneshot acts as an AI copilot in the real world. For example, in a pizzeria, it can store the step-by-step knowledge of how to make a specific pizza and what that should look like.
Technically, Oneshot runs fully locally. It combines local storage, a local vision-language model (VLM), and a local AI node into a single application. This allows knowledge to persist in a place without relying on the cloud, making it suitable for environments where latency, privacy, or connectivity matter.
In short, Oneshot is about giving places memory: practical, reusable knowledge that stays where the work happens.
Or as Mika puts it: “It’s about storing memory to a place. Basically, Oneshot is the engine that collects live streams from clients in a location. It then uses logic to determine which part of a procedure someone is in. So it's like the orchestrator you can tell ‘okay I want to do this procedure’ to and Oneshot will fetch from the location’s memory and guide you through it.”
The inspiration for Oneshot comes directly from Mika’s experience as a maritime engineer working at sea.
“I was working on a ship for a year and, to be honest, I was a really bad worker. I made a lot of mistakes, lots of tiny mistakes. Partly because I just didn't know enough but also maybe I was stressed about making mistakes.”
Mika repeatedly ran into the problem that many complex, high-stakes environments share: critical procedures live in manuals and people’s heads, and are not always readily available. Under stress and without immediate guidance, even small mistakes can compound into costly failures.
One incident in particular became formative. After forgetting to close a valve, an undetected oil leak ran overnight, spilling roughly six tons of oil inside the ship. The cleanup took two weeks and was very costly. During that time, Mika kept coming back to the same question: “how could this be better?”
What if the ship itself could guide you? What if procedures, maintenance steps, and operational knowledge were visually present in the environment, exactly when and where they were needed?
This line of thinking led him to imagine ships as digital twins: places where data, procedures, and context-aware guidance could be overlaid directly onto physical equipment.
“I went home. I realized sailing is not for me. I wanted to do something different. And this is when I learned XR, developing for augmented reality and VR on the Metaquest. That was the beginning.”
As smart glasses and spatial computing matured, the concept finally became practical.
“Now that smart glasses are finally good enough, I realized that people will wear glasses in workplaces like ships and beyond that in all workplaces.”
Oneshot emerged as a way to make environments themselves self-sufficient holders of knowledge. By using local compute, low-latency computer vision, and on-site storage, a place can retain and improve its operational memory without relying on constant connectivity. In Mika’s vision, locations don’t just house work; they remember how work should be done, and they keep getting better over time.
“The vision is to have everything run locally. I think we will get there. I think there will be a massive shift to edge computing. And then at the same time you don't want to rely on connectivity to be able to work.”
Rather than acting as a single-purpose assistant or a static digital manual, Oneshot coordinates live data, stored knowledge, and procedural logic to guide people through real-world tasks in real time.
Oneshot sits one level above individual procedures. It continuously ingests live streams from devices deployed in a location and combines this input with user intent. A user might explicitly start a task (“clean the oven”), or Oneshot may infer what’s happening from the visual context.
From there, the engine identifies which procedure is relevant, determines the current step, and fetches the required information from the local memory of the building. In this sense, Oneshot behaves like an AI agent: it interprets intent, gathers the right data, and orchestrates what should happen next.
“It's not one manual. It's like a collection of manuals and different things. I think everything is possible inside the same engine. You need to make it broad.”
While Oneshot has agent-like behavior, procedures remain the core abstraction. Each workflow, making a pizza, performing maintenance, checking quality, or answering operational questions, is defined as a sequence of steps.
During execution, Oneshot:
Tracks progress through the procedure using live visual input
Verifies actions with a vision-language model (VLM)
Allows users to pause, ask questions, or jump between steps
This transforms traditional step-by-step manuals into interactive, state-aware workflows that respond to what’s actually happening in the environment.
“The process is still the basis. You can launch Oneshot and ask it to do a specific procedure. During the process you're walking through the steps and it will check what you're doing. You can also ask questions about it and then it will fetch the agentic memory. But the procedure is the basis.”
To support this, Oneshot uses a dual-manual architecture:
VLM manuals are machine-facing and highly structured. Procedures are encoded (for example, as JSON) with prompts and rules that tell the vision model what to check at each step. These rules include not only the basic requirements, but also quality-related edge cases: are ingredients evenly distributed on a pizza.
LLM manuals are human-facing and derived from existing text documentation. They contain descriptive knowledge like tips, constraints, timings, temperatures, and best practices. When a user asks, “Did I forget something?” or “What temperature should the oven be?”, Oneshot queries this manual to provide contextual answers.
Together, these two manuals allow Oneshot to both verify actions and explain them.
“I think the core thing that we are solving is the evaporation of knowledge.”
Today, critical know-how lives mostly in people’s heads. Procedures, shortcuts, edge cases, and hard-won lessons are learned on the job, passed informally to the next person, and often lost entirely when someone leaves. Each workplace becomes an isolated island of experience, where mistakes are repeated, best practices spread slowly, if at all, and institutional knowledge quietly disappears over time.
“They talk about the silver tsunami, right? The old legends that collected all the knowledge are about to retire in the next ten years. This is across all industries. All the institutional knowledge will go with them. At one point we might forget how to do things. That's a big issue.”
This problem is becoming more urgent as experienced workers retire and fewer new workers replace them. Years or decades of accumulated expertise risk vanishing, not because it isn’t valuable, but because there is no effective way to anchor it anywhere permanent.
Oneshot addresses this by storing knowledge in the place itself. Instead of relying on human memory and handovers, lessons learned, best practices, and procedures are captured, refined, and made available to everyone who works there. A mistake corrected once can improve the system for everyone. A best-performing engineer or employee can effectively teach every location at once. This is particularly useful for businesses with many locations, like retail or restaurant chains.
“I think one of the biggest issues here is each location might have a lot of knowledge but they are all isolated islands so they can’t really spread knowledge across locations fast.“
In this way, Oneshot turns disconnected locations into a shared hive mind: where knowledge compounds instead of evaporates, spreads instantly across sites, and persists even as people come and go. By giving buildings memory, Oneshot fundamentally changes how organizations retain, share, and scale what they know.
The biggest challenge Oneshot faces is making procedures legible to vision-language models with enough precision to reliably reflect what’s actually happening in the real world.
“The hardest part is getting a procedure and describing it in a way that the VLM understands well enough.”
In practice, this means translating human workflows, things people understand intuitively, into language that a VLM interprets correctly. Small ambiguities in wording can lead to incorrect conclusions. For example, a step described as “the burger is on top of the pan” may be true in a two-dimensional visual sense, even if the burger is not actually in the pan yet. To a human, the difference is obvious. To a model, it depends entirely on how the instruction is phrased.
This makes procedure authoring far more difficult than simply writing a checklist. Each step has to be described with careful, model-aware language that accounts for spatial relationships, timing, and context. On top of that, every step has edge cases, states that look correct at a glance but should not allow the procedure to continue.
“I think that will be the hardest issue, making the VLM more accurate.”
To address this, Oneshot relies on layered verification rather than single checks. Instead of asking the model one question per step, it asks multiple questions in parallel: a primary, positive confirmation and several negative checks designed to eliminate edge cases. Even if the main condition is satisfied, a single failing negative check can block progression.
Oneshot uses the Auki network as its underlying spatial and computational backbone, enabling both scalable image processing and place-based intelligence.
“I’m using the network to process the images, on a locally hosted VLM node. One reason why Auki is such a good architecture is that it's distributed. So if there's a building with high demand in compute, it can ask the network to fill in which is amazing. That makes everything really scalable.”
On the compute side, Oneshot integrates with Auki’s distributed VLM node architecture. Each location can host its own VLM node to process visual streams locally, but when demand spikes the system can offload work to other nodes on the Auki network.
Beyond compute, Auki plays a key role in spatial reasoning. Oneshot plans to leverage Auki’s positioning services to attach knowledge directly to physical locations and objects. Instead of treating procedures as abstract instructions, the system can associate them with spatial context.
This enables interaction patterns that go beyond step-by-step guidance. A user can look at an object, point at it, and ask questions such as “What temperature should this oven be at?” or “Is this the correct valve for this system?” Because Oneshot knows the user’s position, pose, and what they are looking at, it can reason about the surrounding environment and surface the right knowledge at the right moment.
Spatial awareness also enables safety and risk detection. In complex environments like ships, Oneshot can combine live vision with spatial context to warn about nearby hazards:
“If you're working on the electrical systems of a ship you need to know what's around you to get the full picture. For instance, Oneshot could say, ‘Hey, I see that the pump on top of you is still powered.’ I think that's the future. This is also how humans remember. They map memories to a location in their head.”
In short, the real world web allows Oneshot to move from procedural guidance to spatially grounded intelligence. It provides distributed compute for scalable vision processing, and the positioning and spatial memory needed to turn buildings into environments that understand what you’re looking at, where you are, and what you should do next.
Oneshot is being built incrementally, with a strong focus on real-world validation rather than trying to solve everything at once.
“The roadmap is to first create a system that has the most important procedures that a person can use while learning things. So when there's a new hire, put on the glasses and he will be able to do things already. That will be the first major milestone.”
Start with a narrow, high-value use case, prove it works with real users, and then expand from there.
The immediate focus is on building the next prototype of Oneshot, one that allows users to ask questions during a procedure, not just follow steps. This lowers the barrier for learning and reduces mistakes, especially for new employees who may hesitate to ask for help.
Oneshot offers a glimpse of what becomes possible when AI is spatially grounded. By combining procedural intelligence with spatial context on the real world web, Oneshot allows workplaces to retain knowledge, guide new hires, and improve over time. As more projects like Oneshot come online, the Auki ecosystem is beginning to demonstrate the potential of the real world web. To stay up to date on developments and get announcements from Oneshot follow @augmentedcamel and @oneshot_ar on X.
Auki is making the physical world accessible to AI by building the real world web: a way for robots and digital devices like smart glasses and phones to browse, navigate, and search physical locations.
70% of the world economy is still tied to physical locations and labor, so making the physical world accessible to AI represents a 3X increase in the TAM of AI in general. Auki's goal is to become the decentralized nervous system of AI in the physical world, providing collaborative spatial reasoning for the next 100bn devices on Earth and beyond.
X | Discord | LinkedIn | YouTube | Whitepaper | auki.com