Isaiah N. Granet, Co-Founder and CEO of Bland – Interview Series
Isaiah N. Granet
, Co-Founder and CEO of Bland, is a startup founder and engineer whose background blends technical execution with early entrepreneurial experience and long-standing social impact work. Prior to launching his current venture, he participated in Z Fellows and Y Combinator, built engineering experience at Lantern, and founded San Diego Chill, a nonprofit that raised over $2.5 million to help children with developmental disabilities access sports, earning national recognition and continuing today with his involvement at the board level.
Bland
is focused on building infrastructure for AI-powered phone calls, enabling businesses to deploy voice agents that can handle customer support, sales, and operational workflows at scale. The platform is designed to replace or augment traditional call centers by offering programmable voice interactions, real-time responsiveness, and deep integrations with business systems, positioning itself as a core layer in how companies automate communication with customers.
You founded San Diego Chill as a teenager to create inclusive access to sports for children with developmental disabilities, long before entering Y Combinator or launching Bland. How did that early experience building a real-world organization influence the way you approached founding a voice-first AI company that now sits between enterprises and their customers?
A lot of my life and work has focused on building. From a young age, I had this constant desire to bring things to life. Once an idea or a belief about the world pops in my head it becomes impossible for me to ignore it. Building the San Diego Chill not only taught me how to create and run an organization, it taught me about the impact our actions can have on others. To be able to give back by creating an organization that otherwise would never have existed is something deeply rewarding. The lessons and values I learned from the Chill carry with me every day.
After going through YC in 2023, what convinced you that enterprise voice infrastructure was still fundamentally broken enough to justify building an end-to-end system rather than layering LLMs on top of legacy IVR tools?
Think about the last time you used a bank chatbot. You probably waited longer than you should have, got an answer that didn’t address what you actually asked, and ended up calling anyway. Then a robotic voice walked you through a menu of options you didn’t want, and pressing 0 did nothing useful.
Banks have spent billions making that experience possible, and chatbots still rank dead last in customer satisfaction at 29%. Lower than email. Lower than call centers, which everyone already complains about.
That’s been the dynamic for two decades. Businesses trying to keep customers away from their staff. Customers keep trying to get to a person. Neither side is winning.
The issue isn’t that companies don’t want to fix it. They just can’t staff their way to a good experience at scale. A call center that handles a million calls a month is an expensive, difficult operation, and the quality is inconsistent almost by definition.
What changed is that AI finally makes it possible to resolve calls instead of just routing or deflecting them. Not phone trees. Not hold music. An agent that understands what the customer is asking and handles it.
But that only works if the system is built for real-time voice from the ground up. When you layer LLMs on top of legacy IVR tools or stitch together third-party services, latency creeps in and reliability drops. Conversations break down.
That’s why we focused on building the infrastructure end to end. Voice only works if it feels immediate and natural. If it doesn’t, the customer hangs up.
Bland has taken the unusual step of building and hosting its own TTS, inference, and transcription stack internally. What tradeoffs did you see in relying on third-party APIs that ultimately pushed you toward owning the full voice infrastructure layer?
Every layer you outsource adds latency and adds risk.
Most voice AI platforms are resellers. They take third-party transcription, add a third-party model, route it through third-party TTS, and hand you the result. That can work in a controlled demo. It rarely holds up when call volume spikes or something in the chain goes wrong.
There’s also a data problem. Foundation model providers, OpenAI being the obvious example, have used customer data to train models. They say enterprise licenses are different. Maybe they are. But that uncertainty is enough to make a lot of security and compliance teams uncomfortable.
When you self-host the entire stack — transcription, inference, TTS, orchestration — you control every millisecond and every model update. Customer data stays inside the customer’s ecosystem. It doesn’t touch a third-party training pipeline, doesn’t pass through infrastructure you can’t audit, and doesn’t move unless the customer decides it should.
You can give each enterprise customer dedicated infrastructure so a spike from another company never touches their performance. And when something breaks, you can actually fix it rather than waiting on a vendor’s vendor.
For regulated industries, some customers need the full stack in their own VPC or on-premises. That’s only possible if the vendor actually owns what they’re deploying.
Traditional contact center automation has focused heavily on deflecting simple support calls. Why did you decide to prioritize long-tail, complex customer interactions instead of optimizing for volume-based automation first?
Traditional contact center automation has largely focused on deflecting simple support calls. Why did you prioritize complex, long-tail interactions instead of starting with high-volume use cases?
We took the opposite approach. If we can reliably handle the most complex and sensitive calls, everything else becomes straightforward. The goal isn’t to build demos, it’s to deliver full agentic call resolution at scale. That requires low-latency, high-reliability systems that can manage the edge cases that actually define real customer conversations.
Your agents are increasingly being integrated into CRMs and operational databases to resolve calls end-to-end. How does voice-native automation change the architecture of enterprise workflows compared to chat-based copilots?
Legacy systems often don’t talk to each other. CRMs, scheduling tools, and billing platforms are siloed. Without access to those systems, a voice agent can answer generic questions and not much else.
It can’t look up an account, update a record, or book an appointment. It collects information and hands it off. Meanwhile, human reps spend time on work that shouldn’t touch a person: logging call notes, manually scheduling appointments, pulling reports to figure out who needs a follow-up.
Deep integration is what makes end-to-end resolution possible. Without it, you’ve automated the greeting, not the call.
The recent Soulja Boy voice clone demo highlighted how conversational agents can extend beyond internal operations into brand-facing experiences. Do you see enterprise voice agents evolving into customer-facing digital representatives that operate continuously across sales, support, and marketing channels?
Absolutely. We see a world in which every customer has a personal relationship with their favorite and essential businesses. What’s important is that AI is not just “fun” but capable of truly resolving your most complex issues.
Real-time voice introduces latency, hallucination, and identity challenges that don’t exist in text-based AI deployments. What were the hardest technical constraints you encountered when building agents that need to respond in under a second while maintaining conversational accuracy?
Latency. That’s where most demos die.
If a chatbot takes three seconds to respond, the user waits. If a voice agent pauses awkwardly after you finish speaking, the conversation is already broken. Responses need to come back in under 400 milliseconds. Most platforms can’t get there because they’re stitching together multiple third-party services, each adding their own delay.
But latency is only part of it. Real customer calls are messy in ways demos never capture. People interrupt mid-sentence. Background noise cuts in. Callers switch languages. Requests are vague. The voice AI that holds up in production handles interruptions without losing context, adapts when conversations go off-script, and does it without sounding like it’s buffering.
Customers don’t compare voice AI to other bots. They compare it to talking to a person. That’s the bar.
There’s growing scrutiny around how human-sounding AI systems represent themselves during interactions. How should enterprises think about transparency when deploying conversational agents that may be indistinguishable from human staff?
We firmly believe in honesty and transparency for the end user. While some regulation is burdensome and stifling, any form of deception isn’t acceptable. We work with enterprises to develop seamless experiences that are on the basis of trust with the customer.
As AI agents begin handling millions of simultaneous customer interactions, what operational challenges tend to surface first when companies move from pilot deployments to production-scale rollouts?
A few things matter in practice. The first is modular prompt architecture. Monolithic prompts are almost impossible to debug. When a call goes wrong, you need to isolate exactly where and why it happened, not stare at a wall of instructions trying to figure out which line caused the problem.
Full observability matters just as much. Post-call summaries aren’t enough. You need real-time visibility into what the agent is doing at every point in every interaction.
Guardrails are also essential, especially in regulated industries. The agent has to stay within policy. That isn’t optional. And if it doesn’t, there needs to be a graceful fallback.
Finally, there’s knowledge management. The agent needs access to proprietary data like products, policies, and procedures. The platform should also surface knowledge gaps automatically as they appear in real calls, not weeks later after a customer complains.
Looking ahead, do you believe enterprise voice agents will remain task-specific tools, or will they evolve into generalized AI agents capable of autonomously managing entire business processes initiated through conversation?
If only I had the answer! I think that voice agents will evolve across the entire business stack but it’s unlikely to see an entire business run by a voice agent. That being said, I do believe humans will be able to get instant, accurate, and more comprehensive service from AI agents than they get today. In fact, we believe more phone calls will happen when this occurs. Not less.
Thank you for the great interview, readers who wish to learn more should visit
Bland
.
