Two Voice Devs: Recent Episodes

Mark and Allen

Mark and Allen talk about the latest news in the VoiceFirst world from a developer point of view.

View Details

OpenAI's ChatGPT 4o and GPT 4o announcements have sent shockwaves through the developer community! In this episode of Two Voice Devs, Mark and Allen dive into the implications of these new models, comparing them to Google's Gemini.

We discuss:

[00:00:10] Initial takeaways from the OpenAI presentations.

[00:02:29] The impressive voice capabilities of ChatGPT 4o.

[00:04:49] Concerns about OpenAI's ambitions for conversational AI.

[00:07:30] The difference between "doing" and "knowing" AI systems.

[00:14:15] A detailed breakdown of GPT 4o, including its strengths and weaknesses.

[00:17:43] Comparison with Gemini and implications for developers.

[00:19:41] The importance of competition in driving innovation and lowering prices.

[00:21:48] The future of AI assistants and the role of developers.

Let us know what you think about GPT 4o and Gemini! Have you used them? Share your experiences and thoughts in the comments below.

View Details

Allen Firstenberg chats with fellow Google Developer Expert (GDE) Mike Wolfson about his career, the evolution of Android, and his new interest in generative AI. Mike shares his thoughts on the future of AI with agents, Large Action Models (LAMs), and the potential of the "Rabbit," a new AI-powered device. Does the Rabbit live up to its promise? If not - what could?

Timestamps:

00:00:00 - Introduction

00:01:32 - Mike's career journey

00:04:15 - Transition from enterprise Java to Android development

00:05:04 - Creating "Droid of the Day" app

00:06:49 - Becoming an Android developer and Google Developer Expert

00:09:23 - Shift in focus from Android to generative AI

00:10:57 - Generative AI as a platform

00:11:47 - The Rabbit and its potential

00:14:59 - Mike's take on the Rabbit as a developer

00:17:31 - Current integrations with the Rabbit

00:19:52 - The future of AI and the Rabbit

00:24:46 - Edge AI and its potential

00:27:16 - The capabilities of the Rabbit and its future

00:32:17 - The Rabbit vs. other devices like meta glasses

00:34:28 - Conclusion and call to action

View Details

Join Allen and Roya as they dissect the major AI announcements from Google I/O 2024. From Gemini updates and new models to responsible AI and groundbreaking projects like ASTRA, this episode dives into the future of AI development.

Timestamps:

[00:00:00] Introduction and Google I/O Overview

[00:02:00] Gemini 1.5 Flash & Gemini 1.5 Pro: New Models and Features

[00:04:30] AI Studio Access Expansion for Europe, UK & Switzerland

[00:06:20] Choosing the Right AI Model for Your Project

[00:06:50] Gemini Nano in Google Chrome: Bringing AI to the Browser

[00:08:00] Pali Gemma: Open Source Model with Image & Text Input

[00:08:50] AI Red Teaming & Model Safety Tools

[00:09:50] Parallel Function Calling for Developers

[00:10:30] Video Frame Extraction: Easier Multimodal Development

[00:11:20] GenKit: Firebase's Generative AI Integration

[00:12:00] Gems: Customizable Gemini for Developers

[00:12:50] Semantic Embeddings: Understanding & Creating Images

[00:13:50] Imogen 3: API Access for Image Generation

[00:14:20] Veo: Video Generation with Lumiere Architecture

[00:14:50] SynthID: Watermarking & Identifying Generated Content

[00:16:30] Responsible AI & Inclusivity

[00:18:00] Gemini Developer Competition: Win a DeLorean & Cash Prizes!

[00:19:30] Project ASTRA: Multimodal AI with Contextual Memory

[00:21:00] Google Glasses & Project ASTRA Integration

[00:22:00] Closing Thoughts: AI for Everyone

View Details

Join Allen and Mark as they delve into Voiceflow's groundbreaking new feature: intent classification using a hybrid of LLMs and classic NLU models. Discover how this innovative approach leverages the strengths of both technologies to achieve greater accuracy and flexibility in understanding user intent. How they're doing it just may blow your mind! 🤯

Timestamps:

0:00:00 - Introduction

0:00:33 - Exploring the concept of intents and slots in conversational UI

0:05:11 - Understanding Natural Language Understanding (NLU) and its role in intent classification

0:06:02 - Voiceflow's hybrid approach: Combining classic NLU with LLMs

0:08:36 - Deep dive into Voiceflow's documentation on intent classification using LLMs

0:13:43 - Understanding the hybrid approach and its components: intent descriptions, prompt wrappers, and training data

0:24:31 - How the classic NLU model pre-filters intents for the LLM, improving efficiency and accuracy

0:27:27 - Exploring the user experience and the flow of intent classification with the hybrid model

0:32:53 - Voiceflow's commitment to open research and sharing knowledge with the developer community

0:35:52 - The value of benchmarking and analyzing different LLM models for intent classification

0:39:12 - Call to action: Share your thoughts and experiences with Voiceflow's hybrid approach

View Details

Join Allen Firstenberg and guest host Stefania Pecore on Two Voice Devs as they delve into the exciting announcements and highlights from Google Cloud Next 2024! This episode focuses on the latest advancements in AI and their impact on the healthcare industry, providing valuable insights for developers and tech enthusiasts.

Learn more:

  • https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2024-wrap-up

Timestamps:

00:00:00: Introduction

00:01:02: Stefania's background and journey into AI

00:07:20: Stefania's overall experience at Google Cloud Next

00:11:59: Focus on Healthcare and AI applications, including Mayo Clinic's Solution Studio

00:15:38: Exploring the new Gemini product suite and its features like code assistance and data analysis

00:20:44: Discussing Gemini API updates, including the 1.5 public preview with 1M token context window and grounding tools

00:26:06: Vertex AI Agent Builder and its no-code approach to chatbot developmen

t

00:33:02: Hardware announcements, including the A3 VM with NVIDIA H100 GPUs

00:35:24: Stefania's reflections on Cloud Next and the value of attending

Tune in to discover the future of AI and its transformative potential, especially in the healthcare sector. Share your thoughts on the Google Cloud Next announcements in the comments below!

View Details

This episode of Two Voice Devs takes a closer look at BERT, a powerful language model with applications beyond the typical hype surrounding large language models (LLMs). We delve into the specifics of BERT, its strengths in understanding and classifying text, and how developers can utilize it for tasks like sentiment analysis, entity recognition, and more.

Timestamps:

0:00:00: Introduction

0:01:04: What is BERT and how does it differ from LLMs?

0:02:16: Exploring Hugging Face and the BERT base uncased model.

0:04:17: BERT's pre-training process and tasks: Masked Language Modeling and Next Sentence Prediction.

0:11:11: Understanding the concept of masked language modeling and next sentence prediction.

0:19:45: Diving into the original BERT research paper.

0:27:55: Fine-tuning BERT for specific tasks: Sentiment Analysis example.

0:32:11: Building upon BERT: Exploring the Roberta model and its applications.

0:39:27: Discussion on BERT's limitations and its role in the NLP landscape.

Join us as we explore the practical side of BERT and discover how this model can be a valuable tool for developers working with text-based data. We'll discuss i

ts capabilities, limitations, and potential use cases to provide a comprehensive understanding of this foundational NLP model.

View Details

Embark on a wild race with Gemma as we explore the exciting (and sometimes slow) world of running Google's open-source large language model! We'll test drive different methods, from the leisurely pace of Ollama on a local machine to the speedier Groq platform. Join us as we compare these approaches, analyzing performance, costs, and ease of use for developers working with LLMs. Will the tortoise or the hare win this race?

Learn more:

  • Model card: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/335

  • Ollama: https://ollama.com/

  • LangChain.js with Ollama: https://js.langchain.com/docs/integrations/llms/ollama

  • Groq: https://groq.com/

Timestamps:

0:00:00 - Introduction

0:03:05 - Getting to Know Gemma: Exploring the Model Card

0:05:30 - Vertex AI Endpoint: Fast Deployment, But at What Cost?

0:13:40 - Ollama: The Tortoise of Local LLM Hosting

0:17:40 - LangChain Integration: Adding Functionality to Ollama

0:21:44 - Groq: The Hare of LLM Hardware

0:26:06 - Comparing Approaches: Speed vs. Cost vs. Control

0:27:35 - Future of Open LLMs and Google Cloud Next

GemmaSprint

This project was supported, in part, by Cloud Credits from Google

View Details

The Alexa Developer Rewards Program (ADR) is shutting down, leaving many developers wondering about the future of Alexa skills. Mark and Allen discuss the implications of this change, explore alternative monetization options, and share their thoughts on the future of skill development.

Timestamps:

0:00 - Intro and announcement of the ADR program ending

1:45 - History of the ADR program and its impact on skill development

7:13 - Discussion of the Skill Developer Accelerator Program (SDAP) and Skill Coach

14:04 - Status of AWS credits for skill developers

15:10 - Incentives for building skills in the absence of the ADR program

21:30 - Cost-benefit analysis and the future of skill development

25:48 - Call to action: Share your thoughts on the ADR program ending and the future of skills

Join the conversation and let us know what you think!E

View Details

As large language models (LLMs) become increasingly powerful, ensuring their responsible use is crucial. In this episode of Two Voice Devs, Allen and Mark delve into Google's Gemini LLM, specifically its built-in safety features designed to prevent harmful outputs like harassment, hate speech, sexually explicit content, and dangerous information.

Join them as they discuss:

(00:01:55) The importance of safety features in LLMs and Google's approach to responsible AI.

(00:03:08) A walkthrough of Gemini's safety settings in AI Studio, including the four categories of evaluation and developer control options.

(00:06:51) Examples of how Gemini flags potentially harmful prompts and responses, and how developers can adjust settings to control output.

(00:08:55) A deep dive into the API, exploring the parameters and responses related to safety features.

(00:19:38) The challenges of handling incomplete responses due to safety violations and the need for better recovery strategies.

(00:26:47) The importance of industry standards and finer-grained control for responsible AI development.

(00:29:00) A call to action for developers and conversation designers to discuss and collaborate on best practices for handling safety issues in LLMs.

This episode offers valuable insights for developers working with LLMs and anyone interested in the future of responsible AI. Tune in and share your thoughts on how we can build safer and more ethical AI systems!

View Details

In this episode of Two Voice Devs, Mark and Allen discuss how developers can leverage AI tools like ChatGPT to improve their workflow. Mark shares his experience using ChatGPT to generate an OpenAPI specification from TypeScript types, saving him significant time and effort. They discuss the benefits and limitations of using AI for code generation, emphasizing the importance of understanding the generated code and maintaining healthy skepticism.

Timestamps:

00:00:00 Introduction

00:00:49 Using AI as a developer tool

00:01:17 Generating OpenAPI specifications with ChatGPT

00:04:02 Mark's prompt and TypeScript types

00:05:37 Reviewing the generated OpenAPI specification

00:07:12 Adding request examples with ChatGPT

00:10:11 Benefits and limitations of AI code generation

00:13:43 Using AI tools for learning and understanding code

00:17:39 Trusting AI-generated code and potential for bias

00:19:04 Integrating AI tools into the development workflow

00:22:38 The future of AI in software development

00:23:17 Programmers as problem solvers, not just code writers

00:25:41 AI as a tool in the developer's toolbox

00:26:07 Call to action: Share your experiences with AI tools

This episode offers valuable insights for developers interested in exploring the potential of AI to enhance their productivity and efficiency.

View Details

Join us on Two Voice Devs as we chat with Xavi, Head of Cloud Infrastructure at Voiceflow, about the exciting new Voiceflow Functions feature and the future of conversational AI development. Xavi shares his journey into the world of bots and assistants, dives into the technology behind Voiceflow's infrastructure, and explains how functions empower developers to create custom, reusable components for their conversational experiences.

Timestamps:

  • 00:00:00 Introduction
  • 00:00:49 Xavi's journey into conversational AI
  • 00:06:08 Voiceflow's infrastructure and technology
  • 00:09:29 Voiceflow's evolution and direction
  • 00:13:28 Introducing Voiceflow Functions
  • 00:16:05 Capabilities and limitations of functions
  • 00:20:35 Future of Voiceflow Functions
  • 00:21:02 Sharing and contributing functions
  • 00:24:02 Technical limitations of functions
  • 00:25:35 Closing remarks and call to action

Whether you're a seasoned developer or just getting started with conversational AI, this episode offers valuable insights into the evolving landscape of bot development and the powerful capabilities of Voiceflow.

View Details

In this episode of Two Voice Devs, Allen Firstenberg and Roger Kibbe explore the rising trend of local LLMs, smaller language models designed to run on personal devices instead of relying on cloud-based APIs. They discuss the advantages and disadvantages of this approach, focusing on data privacy, control, cost efficiency, and the unique opportunities it presents for developers. They also delve into the importance of fine-tuning these smaller models for specific tasks, enabling them to excel in areas like legal contract analysis and mobile app development.

The conversation dives into various popular local LLM models, including:

  • Mistral: Roger's favorite, lauded for its capabilities and ability to run efficiently on smaller machines.
  • Phi-2: A tiny model from Microsoft ideal for on-device applications.
  • Llama: Meta's influential model, with Llama 2 currently leading the pack and Llama 3 anticipated to be comparable to ChatGPT 4.
  • Gemma: Google's new open-source model with potential, but still under evaluation.

Learn more:

  • Ollama: https://ollama.com/
  • Ollama source: https://github.com/ollama/ollama
  • LM Studio: https://lmstudio.ai/

Timestamps:

00:00:00: Introduction and welcome back to Roger Kibbe.

00:01:31: Roger discusses his career path and his passion for voice and AI.

00:06:33: The discussion turns to the larger vs. smaller LLMs.

00:13:52: Understanding key terminology like quantization and fine-tuning.

00:20:58: Roger shares his favorite local LLM models.

00:25:14: Discussing the strengths and weaknesses of smaller models like Gemma.

00:30:32: Exploring the benefits and challenges of running LLMs locally.

00:39:15: The value of local LLMs for developers and individual learning.

00:40:29: The impact of local LLMs on mobile devices and app development.

00:49:27: Closing thoughts and call for audience feedback.

Join Allen and Roger as they explore the exciting potential of local LLMs and how they might revolutionize the development landscape!

View Details

Join Allen and Mark on Two Voice Devs as they dive into the world of Large Action Models (LAMs) and explore their potential to revolutionize how we build chatbots and voice assistants.

Inspired by Braden Ream's article "How Large Action Models Work and Change the Way We Build Chatbots and Agents," the discussion dissects the core functions of conversational AI - understand, decide, and respond - and examines how LAMs might fit into this framework.

Allen and Mark also compare and contrast LAMs with Large Language Models (LLMs) and Natural Language Understanding (NLU), highlighting the strengths and limitations of each approach.

Tune in to hear their insights on:

  • The evolution of Voiceflow and its shift towards LLMs (03:20)
  • Understanding the core functions of conversational AI (05:40)
  • Clippy as an example of a deterministic agent (06:15)
  • The differences between deterministic and probabilistic models (07:50)
  • NLU vs. LLMs for understanding user input (09:20)
  • How LAMs might fit into the "decide" stage of conversational AI (18:50)
  • The challenges of training LAMs and avoiding hallucinations (20:00)
  • The potential of LAMs to improve response generation (29:30)
  • Cost considerations of using LLMs vs. NLUs (37:00)

Whether you're a seasoned developer or just curious about the future of conversational AI, this episode offers a thought-provoking discussion on the potential of LAMs and the challenges that lie ahead.

Be sure to share your thoughts in the comments below!

Additional Info:

  • https://www.voiceflow.com/blog/large-action-models-change-the-way-we-build-chatbots-again

View Details

Google's Gemini 1.5 is here, boasting a mind-blowing 1 million token context window! 🤯 Join Allen and Linda as they dive deep into this experimental AI, exploring its capabilities, limitations, and potential use cases. 🤔

They share their experiences testing Gemini 1.5 with original content, including Two Voice Devs transcripts and synthetic videos, and discuss the challenges of finding data that hasn't already been used to train the AI. 🧐

Get ready for a lively discussion on hallucinations, the future of content creation, and the ethical questions surrounding these powerful language models. 🤖

More info:

  • https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/

  • https://developers.googleblog.com/2024/02/gemini-15-available-for-private-preview-in-google-ai-studio.html

  • https://openai.com/sora

Timestamps:

00:00:00 Introduction

00:01:05 Notable features of Gemini 1.5

00:02:57 What is a token?

00:06:39 Linda's test with Danish citizenship PDF

00:09:33 Allen's test with Les Miserables and needle in a haystack

00:12:27 Testing with Data Portability API data

00:14:28 Linda's test with YouTube search history and Netflix recommendations

00:17:44 Allen's test with Two Voice Devs transcripts

00:21:32 Issues with counting and hallucinations

00:24:21 Testing with OpenAI's Sora AI synthetic videos

00:30:05 Ethical questions and the future of content creation

00:31:50 Potential use cases for large context windows

00:36:34 API limitations and challenges

00:37:39 Performance and cost considerations

00:41:34 Comparison with retrieval augmented generation and vector databases

00:44:21 Generating summaries and markers from this transcript

Leave your thoughts and questions in the comments below!

View Details

In this episode of Two Voice Devs, hosts Allen Firstenberg and Mark Tucker discuss Gemini, Google's latest name for its Generative AI... stuff. Originally known as separate products including Bard and Duet AI, Gemini encompasses a suite of AI tools, including chatbots, product-specific assistants, models, and APIs that developers can use for various tasks. The discussion covers how Gemini compares with offerings from other companies such as OpenAI and Microsoft, including visible similarities and differences. The show concludes by answering the question about why developers should care about this rename with a call to explore possibilities with AI tools like Gemini to let us create more natural and user-friendly interfaces.

Learn more:

  • https://blog.google/technology/ai/google-gemini-update-sundar-pichai-2024/
  • https://blog.google/products/gemini/bard-gemini-advanced-app/

00:04 Introduction and Catching Up

00:55 Exploring the Gemini Model

04:09 Gemini vs OpenAI: A Comparison

10:20 Understanding the Gemini Branding

12:00 The Developer's Perspective on Gemini

17:46 Closing Thoughts and Future Discussions

View Details

In this episode of Two Voice Devs, hosts Allen Firstenberg and Mark Tucker discuss the CSS Speech Module Level 1 Candidate Recommendation Draft, a standard that enables webpages to talk, developed in collaboration with the voice browser activity. They explore its features including the 'aural' box model concept, voice families, earcons and more, drawing parallels with SSML and highlight its innovative approach to web accessibility complementing screen readers. Despite acknowledging its potential, they address some of its key omissions such as phonemes and the lack of a background audio feature.

00:04 Introduction and Welcome

01:14 Exploring the Concept of Webpages Talking

03:00 Deep Dive into CSS Speech Module

03:48 Understanding the Scope of CSS Speech Module

04:27 The Evolution of Voice Interaction

05:22 Comparing CSS Speech with SSML

07:13 The Power of CSS in Voice Development

22:49 The Impact of Voice Balance Property

29:20 The Limitations of CSS Speech

39:37 The Future of CSS Speech

42:50 Conclusion and Final Thoughts

View Details

Forget Apps! Talking to this Orange Cube Could Change Everything

Is the app model broken? The creators of Rabbit R1, a new voice-first device, certainly think so. In this episode of Two Voice Devs, Mark and Allen break down this innovative device and its potential to change how we interact with technology. What do developers think about the technology underlying RabbitOS? You may be surprised!

Key topics:

  • 00:02:00 - What is the Rabbit R1? Rabbit R1 is a new type of device that prioritizes voice input and output. It aims to shift users away from apps and toward a more conversational way of interacting with technology.
  • 00:05:17 - AI models: Rabbit uses a unique "large action model" to understand and complete tasks. It claims to do this faster and more intuitively than existing voice assistants.
  • 00:14:14 - Teach Me mode: See how Rabbit can be trained to interact with new websites and applications. What implications does this have for the future?
  • 00:18:41 - Can it replace apps? While that's a bold claim, Rabbit's conversational approach and innovative features show promise. Could this be the first step towards a new era in human-computer interaction?

Additional thoughts:

  • 00:25:06 - Hybrid approach: Rabbit smartly combines intent-based and language-based AI models, potentially offering speed and accuracy.
  • 00:32:56 - Asynchronous interactions: It breaks away from the traditional request-response model, offering a more natural conversational experience that aligns with the Star Trek computer vision.
  • 00:07:48 - Price: At just $199, many people are willing to check it out, and this could accelerate interest in voice-driven interfaces.

Is Rabbit R1 a game-changer or just a gimmick? Let us know your thoughts in the comments!

View Details

In this episode of 'Two Voice Devs', hosts Allen Firstenberg and Mark Tucker discuss updates made to Alexa Presentation Language (APL) version 2023.3. They highlight conditional imports, updates made for animations, and more, including APL support for different devices and how to "handle" backward compatibility.

Learn More:

https://developer.amazon.com/en-US/docs/alexa/alexa-presentation-language/apl-latest-version.html

00:08 Introduction and Welcome

00:17 Alexa Presentation Language (APL) Overview

01:02 Understanding APL and its Components

03:23 Exploring APL's Functionality and Usage

05:22 APL's Versioning Strategy and Device Compatibility

09:23 New Features in APL 2023.3: Conditional Imports

15:22 New Features in APL 2023.3: Item Insertion and Removal Commands

18:05 New Features in APL 2023.3: Control Over Scrolling and Paging

19:43 New Features in APL 2023.3: Accessibility Improvements

20:36 New Features in APL 2023.3: Frame Component Deprecation

22:23 New Features in APL 2023.3: Data Property for Sequential and Parallel Commands

25:07 New Features in APL 2023.3: Support for Variable Sized Viewports

26:47 New Features in APL 2023.3: Support for Lottie Files

28:33 New Features in APL 2023.3: String Functions and Vector Graphic Improvements

30:11 New Features in APL 2023.3: Extensions and APL Cheat Sheets

37:26 Strategies for Backwards Compatibility in APL

38:40 Conclusion and Farewell

View Details

In their New Year's discussion, Mark and Allen explore their hopes and predictions for technological advancements in 2024. They discuss the future of Large Language Models (and if that's the right name for them now), expressing anticipation for improvements in latency issues and the potential for models to be hosted on devices rather than cloud-based platforms. The conversation also ventures into the world of AI agents, function calling, and the importance of developers in ensuring safety measures are integrated in AI systems. Finally, they exude excitement about the possibility of AI in multimedia formats, where tools can generate differing output forms like text, video, images, and possibly even audio directly. They explore potential developer opportunities and challenges, emphasizing the importance of understanding regulations and ensuring user privacy and safety.

00:04 Introduction and New Year Reflections

02:05 Looking Forward: Predictions for 2024

02:14 The Future of Large Language Models (LLMs)

03:08 The Impact of LLMs on Voice Assistants

07:44 The Potential of On-Device AI Models

10:14 The Role of Developers in the AI Landscape

20:11 The Future of Multimodal AI Models

26:35 The Importance of Regulations in AI

29:22 Conclusion: Exciting Times Ahead

View Details

Allen Firstenberg and Mark Tucker, hosts of Two Voice Devs, reflect on the year 2023, discussing significant changes and trends in the #VoiceFirst and #GenerativeAI industry and where their predictions from last year were accurate... or fell short. They discuss the transformation and challenges Amazon faced, gleaning predictions from hints at large language models (LLMs) from Google, Amazon, Microsoft, and Apple. They also mention the shift of Voiceflow towards LLMs and recall the notion of retrieval augmented generation.

00:04 Introduction and Welcome

00:12 Reflecting on the Past Year

01:13 Amazon's Progress and Challenges

01:59 Exploring Amazon's Monetization and Widgets

08:45 Google's Journey and the End of Conversational Actions

11:53 The Rise of Large Language Models (LLMs)

17:04 The Impact of Voiceflow and Dialogflow

20:48 Closing Remarks and New Year Wishes

View Details

Mark and Allen get into the Tech-mas spirit, with a little help from Bard.

Hoping you all have the happiest of holiday seasons.

GenerativeAI #VoiceFirst #ConversationalAI #HappyHolidays

View Details

In this in-depth chat between Allen Firstenberg and Linda Lawton, they dive into the functionalities and potential of Google's newly released Gemini model. From their initial experiences to exciting possibilities for the future, they discuss the Gemini Pro and Gemini Pro Vision models, how to #BuildWithGemini, its focus on both text and images, and speedier and more cohesive responses compared to older models. They also delve into its potential for multi-modal support, unique reasoning capabilities, and the challenges they've encountered. The conversation draws interesting insights and sparks exciting ideas on how Gemini could evolve in the future.

00:04 Introduction and Welcome

00:23 Discussing the New Gemini Model

01:33 Comparing Gemini and Bison Models

02:07 Exploring Gemini's Vision Model

03:03 Gemini's Response Quality and Speed

03:53 Gemini's Token Length and Context Window

05:05 Gemini's Pricing and Google AI Studio

05:33 Upcoming Projects and Previews

06:16 Gemini's Role in Code Generation

07:54 Gemini's Model Variants and Limitations

12:01 Creating a Python Desktop App with Gemini

14:07 Gemini's Potential for Assisting the Visually Impaired

18:35 Gemini's Ability to Reason and Count

20:15 Gemini's Multi-Step Reasoning

20:33 Testing Gemini with Multiple Images

21:52 Exploring Image Recognition Capabilities

22:13 Discussing the Limitations of 3D Object Recognition

23:53 Testing Image Recognition with Personal Photos

24:52 Potential Applications of Image Recognition

25:45 Exploring the Multimodal Capabilities of the AI

26:41 Discussing the Challenges of Using the AI in Europe

27:26 Exploring the AQA Model and Its Potential

33:37 Discussing the Future of AI and Image Recognition

37:12 Wishlist for Future AI Capabilities

40:11 Wrapping Up and Looking Forward

View Details

Join Allen Firstenberg and guest host Noble Ackerson, at the Voice and AI 2023 conference. They discuss the growth of AI and how LLM (large language models) are affecting the tech world and delve deep into topics like LangChain, generative AI, and how to optimize AI operations to tackle network latency. There are also plenty of audience questions, exploring the current challenges in AI and potential solutions.

00:03 Introduction and Background of Two Voice Devs

00:31 The Evolution of Voice Technology and AI

01:50 Interactive Q&A Session Begins

01:58 Discussion on Open Source Software and Generative AI

02:59 Deep Dive into LangChain

05:43 Audience Participation and Questions

06:00 Challenges with LangChain and Overhead

08:14 Exploring the Intersection of Voice Technology and Generative AI

12:51 Addressing Network Latency in Voice Technology

19:49 The Future of AI and Voice Technology

26:53 Addressing the Challenges of Network Latency

37:13 Closing Remarks and Future Engagements

View Details

Join Mark Tucker and Allen Firstenberg on Thanksgiving Day for a sincere heart-to-heart on the highs and lows of their tech industry journey. Expressing their gratitude for their family, friends, and colleagues in the tech industry and beyond, they acknowledge the challenging times faced by many. They call on their viewers to remember how unique and important they are and invite them to express their thoughts and emotions openly by reaching out to them.

00:04 Introduction and Thanksgiving Greetings

00:28 Reflecting on the Past Year

02:19 Gratitude for Personal Relationships

03:54 Acknowledging Industry Challenges and Layoffs

05:59 Importance of Community and Support

07:59 Encouragement and Closing Remarks

View Details

Mark Tucker and Allen Firstenberg delve into the recent changes made by VoiceFlow. We explore how VoiceFlow, originally a design resource for Alexa Skills and Google Assistant Actions, has evolved and shifted to include chatbot roles and generative AI responses. Highlighted too are the implications of VoiceFlow's decoupling and transition to 'bot logic as a service'. We look at the necessary technical adjustments and solutions required in the aftermath of these changes, and Mark shares how he created a Jovo plugin as a hassle-free 'integration layer' for handling multiple platforms, taking advantage of Jovo's generic input output.

More info:

  • https://github.com/jovo-community/jovo4-voiceflowdialog-app

00:04 Introduction

00:54 Introducing VoiceFlow

01:44 Exploring VoiceFlow's Evolution

03:13 Understanding VoiceFlow's Changes

05:39 Explaining the VoiceFlow Integration

14:39 Discussing the VoiceFlow Dialog API

25:42 Conclusion

View Details

On this episode, Mark Tucker and Allen Firstenberg dive deep into the latest announcements by OpenAI. They discuss various developments including the launch of GPTs (collections of prompts and documents with configuration settings), the new text-to-speech model, upcoming GPT-4 Turbo, reproducible outputs, and the introduction of the Assistant API. While they express excitement for what these developments could mean for #VoiceFirst, #ConversationAI, and #GenerativeAI, they also voice concerns about discovery solutions, monetization, and the reliance on platform-based infrastructure. Tune in and join the conversation.

More info:

  • https://openai.com/blog/new-models-and-developer-products-announced-at-devday

00:04 Introduction and OpenAI Announcements Edition

00:52 Discussion on OpenAI's New Text to Speech Model

02:15 Exploring the Pricing and Quality of OpenAI's Text to Speech Model

02:52 Concerns and Limitations of OpenAI's Text to Speech Model

06:24 Introduction to GPT 4 Turbo

06:48 Benefits and Limitations of GPT 4 Turbo

09:27 Exploring the Features of GPT 4 Turbo

18:52 Introduction to GPTs and Their Potential

22:22 Concerns and Questions About GPTs

32:14 Discussion on the Assistant API

37:32 Final Thoughts and Wrap Up

View Details

Allen and Mark discuss the practical uses and advantages offered by MakerSuite, an API currently available for Google's PaLM #GenerativeAI model. We look at its unique feature that treats prompts like templates, allowing for versatile manipulation of these templates for varying results. We further delve into how it saves these prompts in Google Drive and how this can be linked to LangChain's new hub concept, leading to an effective 'MakerSuite hub.' Finally, we explore if prompts are more like code or content, and how that fits into the development process. What do you think?

More info:

  • MakerSuite: https://makersuite.google.com/
  • MakerSuite Hub in LangChain JS: https://js.langchain.com/docs/ecosystem/integrations/makersuite

View Details

Mark and Allen explore TypeChat - a new library from Microsoft that makes prompt engineering for function-like operations in #ConversationalAI easier and more robust. Is this a replacement for Intents? Does it go beyond what we could do with Intent-based systems? Is it lacking something? Let's explore!

Learn more:

  • https://github.com/microsoft/TypeChat

View Details

What started as a casual conversation between Mark and Allen turned into a brief exploration of what Retrieval Augmented Generation (RAG) means in the #GenerativeAI and #ConversationalAI world. Toss in some discussion about VoiceFlow and Google's Vertex AI Search and Conversation and we have another dive into the current hot method to bridge the Fuzzy Human / Digital Computer divide.

View Details

Last week, before Google's annual hardware event, Allen teased part of his prediction about Google Assistant and Bard. This week, we'll show the full clip of Allen's prediction and see just how close he was. Then Mark and Allen discuss how recent announcements from OpenAI, Amazon Alexa, and Google compare to each other and, more important, what they each mean for developers in a #GenerativeAI, #ConversationalAI, and perhaps even a #VoiceFirst world, and perhaps make a few more predictions and what we'll hear next.

More info:

  • Blog post about Assistant With Bard: https://blog.google/products/assistant/google-assistant-bard-generative-ai/
  • Announcement at the the Made By Google event: https://www.youtube.com/live/pxlaUCJZ27E?si=I1noN-l3LQHgBktp&t=2941

View Details

The Google Cloud Next conference is a massive display of the latest technologies and products available from Google Cloud - from AI to Zero-Trust solutions. Unsurprisingly, #MachineLearning was prominent in this years show, so Mark and Allen take a look at some of the biggest #GenerativeAI and #ConversationalAI announcements this year.

More info:

  • https://cloud.google.com/blog/topics/google-cloud-next/next-2023-wrap-up

View Details

Mark shares the exciting news that Amazon Alexa will soon have a #VocieFirst #ConversationalAI LLM chat mode! While Allen agrees that this is very exciting news, he still has quite a few questions about how #GenerativeAI technology will fit into Alexa skills. We ask the difficult questions and see what answers are currently out there.

What do you think about this announcement from Alexa?

More info:

  • LLM feature description: https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2023/09/alexa-llm-fall-devices-services-sep-2023
  • Event video: https://youtu.be/_JcP7N0QPOk

View Details

Noble and Allen take a look back at our experiences at this years VOICE + AI conference. What were the big topics being discussed? The amusing moments? And what do we want to see next year?

GenerativeAI #ConversationalAI #VoiceFirst

View Details

Allen and guest host Linda have a wide ranging conversation, from Linda's career path and her experiences as a Google Developer Expert for Google Analytics, to how she leveraged that knowledge while trying out something new with Google's #GenerativeAI tool, MakerSuite and the PaLM API. We take a close look at how developers can use prompts (more than one!) to help turn a user's request into actionable data structures that feed into an API and get results.

More from Linda:

  • https://LindaLawton.DK
  • https://daimto.com

MakerSuiteSprint #LargeLanguageModel

View Details

We're just days away from the annual VOICE+AI conference, hosted this year in Washington, DC. Both Allen and Noble will be speaking (and hosting a live and in person recording of a future episode!), so we'll give a little preview of what you can hear if you're attending.

View Details

Allen and Mark revisit a conversation from episode 146 where they discovered Google had a Vector Database. Now, several months later, Allen has done some work with the Google Cloud Vertex AI Matching Engine and incorporated it into LangChain JS. We discuss why this is important, and how it fits into the overall landscape of LLMs and MLs today. (And Allen has a little announcement towards the end.)

More info:

  • Matching Engine: https://cloud.google.com/vertex-ai/docs/matching-engine/overview

  • LangChain JS: https://js.langchain.com/docs/modules/data_connection/vectorstores/integrations/googlevertexai

View Details

This seems like an easy question, right? If you want to do #ConversationalAI or #GenerativeAI on your own machine with a model such as Llama 2, you can just download the model and... well... then what? This is the question posed to guest host Noble Ackerson - and the answer was both more complicated and simpler than Allen could imagine!

View Details

Amazon has made some changes to the Alexa Presentation Language, dubbing this version 2023.2, and Allen is a bit confused about what these updates bring. Mark, however, clarifies what's new, how it relates to what was previously available, and why some users can benefit from this latest APL release.

View Details

One of the neat features we've seen come out of the #GenerativeAI and #ConversationalAI explosion recently has been the attention being paid to text embeddings and how they can be used to radically change how we index and search for things. Allen, however, has recently been working with an image embedding model from Google, including incorporating it into LangChain JS. Mark asks about what that process was like, what this new model lets us do, and starts to explore some of the potential of this new tool that is available for everyone.

References:

  • LangChain JS module: https://js.langchain.com/docs/modules/data_connection/experimental/multimodal_embeddings/google_vertex_ai
  • Information from Google: https://cloud.google.com/vertex-ai/docs/generative-ai/embeddings/get-image-embeddings
  • Google Model Garden info: https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/5
  • XKCD: https://xkcd.com/1425/

View Details

Three years of Two Voice Devs! There's no doubt that the #VoiceFirst industry has changed over that time, with the rise of #GenerativeAI and #ConversationalAI taking the world by storm. Mark and Allen look back at how the show has evolved over this time, and why we hope you'll be joining us as we continue forward on our journey!

View Details

Guest Host Xavier Portilla returns to chat with Allen about some of the latest additions to Dialogflow CX. New system functions make some of the processing you can do on inputs easier and faster, while prebuilt flows and flow scoped parameters make it easier to have clearly defined, and reusable, components in your conversation design.

More info:

  • https://cloud.google.com/dialogflow/docs/release-notes#July_05_2023

View Details

Guest host Xavier Portilla joins Allen to take a look at a new slot type that the Alexa team has in public beta. How can this new type be used? How does it differ from previous slot types? And what is a slot type anyway?

View Details

Guest Host Leslie Pound joins Allen to discuss her perspective on software development and #GenerativeAI and how, rather than trying to translate our fuzzy side, developers should think about how it helps us be more aware of how users are seeking to be more inspired or creative.

View Details

Noble Ackerson returns to discuss about a recent presentation that Allen made to the Google Developer Group NYC chapter where he illustrates how #GenerativeAI can be used as a bridge between the discrete nature of computers and the "fuzzy" nature of humans. He and Noble discuss how Large Language Models, such as OpenAI and Google's PaLM 2, along with libraries like LangChain become a powerful tool in every developer's toolbox.

View Details

Allen is joined by Noble Ackerson to discuss the latest feature that OpenAI has included with it's GPT models. Functions provide a well defined way for developers to turn unstructured human input to a more structured format that can be processed by your code or using a library such as LangChain. We take a look at both how they can be used, but some of the open questions that remain about their use.

More info:

  • https://platform.openai.com/docs/guides/gpt/function-calling

View Details

This week, Google completed the "sunset" of Conversational Actions for the Google Assistant. Mark and Allen discuss the ups and downs of Actions on Google, how it fit into the #VoiceFirst landscape, and what may come next.

View Details

Another milestone episode! Mark and Allen take advantage of the event to look back at our predictions from episode 100, look back at how #VoiceFirst development has changed over the past 50 episodes (and several years), and look forward to what we'll be talking about in the next 50 episodes.

View Details

It's been a busy week! What have we been up to? Mark has released a new set of cards that summarize and illustrate different AI concepts. Called "AI Explorer Cards of Discovery", we chat about the objectives and the process to create this deck. (And there's a special offer for listeners!) Meanwhile, Allen has been working with Google's new PaLM model as part of Google Cloud's Vertex AI platform and has contributed changes to the popular LangChainJS package to make PaLM available through the open source library.

Resources:

  • AI Explorer Cards of Discovery: https://bit.ly/ai-cards

  • LangChainJS: https://github.com/hwchase17/langchainjs

  • Google PaLM: https://cloud.google.com/ai/generative-ai

View Details

SO MUCH packed into this episode!

Recently, Allen participated in a hackathon sponsored by VoiceFlow, and he used the opportunity to explore ways that LLMs could be used to build on his work talking with spreadsheets in Vodo Drive (see episode 116). He and Mark explore how he did it - from the prompts that were required to integration with VoiceFlow and Google App Script, to how tools like LangChain will help build similar things. We also explore what lessons are learned, how our experience in #VoiceFirst design helps us build good #ConversationalAI tools, how other APIs can (and should!) work alongside AI, and what "fuzzy" roles AI can fill in the modern app experience.

Resources:

  • Vodo Drive: https://vodo-drive.com/

  • PromptHacks Hackathon: https://prompthacks.devpost.com/

  • Vodo AI submission for PromptHacks: https://devpost.com/software/vodo-ai

  • VoiceFlow: https://www.voiceflow.com/

  • Google Apps Script: https://www.google.com/script/start/

  • LangChain: https://github.com/hwchase17/langchain and https://github.com/hwchase17/langchainjs

View Details

It's Google I/O time again! And although Allen couldn't attend in person, he and Mark review the latest announcements relevant to #VoiceFirst and #ConversationalAI developers. From new AI availability to AI workspace, with stops along the way to discuss AI powered hardware, there was lots to hear about. Also some subtle hints from what wasn't said. But did we mention the AI?

Learn more:

  • https://blog.google/technology/developers/google-io-2023-100-announcements/

View Details

We've touched on the use of vector databases as we've started to explore how LLMs and conversational AIs can be useful, but what are they and how do they work? How are they used for more than just LLMs? Mark and Allen explore some of the classic vector DBs, such as HNSW, and some of the newer fully managed ones, including Metal and Pinecone. We even start to ponder what a fully managed embedding and vector db system might look like from the likes of Google, Azure, or AWS, and are surprised that we're closer than we thought!

Resources:

  • HNSWlib: https://github.com/nmslib/hnswlib

  • Pinecone: https://pinecone.io/

  • Metal: https://getmetal.io/

  • Google Cloud Vertex AI Matching Engine: https://cloud.google.com/vertex-ai/docs/matching-engine/overview

  • Amazon AWS Bedrock: https://aws.amazon.com/blogs/machine-learning/announcing-new-tools-for-building-with-generative-ai-on-aws/

View Details

Long teased, the ability for developers to create Alexa Widgets is finally generally available! Mark, an Alexa Champion, has had access for a while now, so he and Allen discuss what it takes to make a Widget, what's new and different, and how it fits into the #VoiceFirst world of skills.

View Details

We're still exploring what LangChain can do, and this week we dive into a tutorial put out by the Voiceflow team that discusses some ways that it can be integrated with ChatGPT using LangChain, bringing the #VoiceFirst and #ConversationalAI worlds closer together. Also a great example of how we go about learning and understanding code that is new to us.

Resources:

  • The tutorial we were following: https://www.voiceflow.com/blog/voiceflow-assistant-openai-gpt

View Details

Over the past few weeks, Mark and Allen have been playing with LangChain and OpenAI, exploring where #ConversationalAI and #VoiceFirst design intersects, and we recorded some of our experiments. In this early one, we take a look at how LangChain with a memory chain can work and keep track of what's going on in the conversation. All in just a few lines of code. More significantly, we discuss the role that LangChain can play in putting together AI and other API components to create voice, web, and app-based agents that include AI as part of the NLU or response elements.

View Details

The latest update for the Alexa Presentation Language, APL 2023.1, has been out for a little bit, and Mark and Allen already discussed one of the biggest features - speech marks. But there was more to this release! Allen is, perhaps, even more excited about how selectors can enable some very dynamic APL interactions, and video subtitles and new graphic masks round out what's new.

More info:

  • https://developer.amazon.com/en-US/docs/alexa/alexa-presentation-language/apl-latest-version.html - What's New With APL

View Details

Guest host Noble Ackerson returns to tie all the pieces of LangChain together and help Allen understand what it does, why it is an increasingly valuable toolkit for many #ConversationalAI developers, and how we can use it to help build #VoiceFirst applications.

Resources:

  • LangChain: https://github.com/hwchase17/langchain

  • LangChainJS: https://github.com/hwchase17/langchainjs

View Details

With the announcement last week that ChatGPT will soon be supporting plugins, Mark and Allen explore what this means for developers, particularly developers who are used to #VoiceFirst development with Amazon Alexa, Google Assistant, and Samsung Bixby. OpenAI is launching it with some interesting features, including hints at how to monetize plugins, but there are still many questions that developers will need answers to. We explore what will be coming, how to prepare for it, what OpenAI still needs to address, and how all of this may play into the future of voice assistants.

Resources:

  • https://platform.openai.com/docs/plugins/ - ChatGPT Plugins documentation

View Details

Allen has been trying to get into building apps that include LLMs, and has been hearing a lot about the LangChain library. But trying to understand it can be... dizzying. Guest host Noble Ackerson joins to help answer some of the questions about LangChain and how it can bring #ConversationalAI to the #VoiceFirst world and how to use it with your existing APIs.

View Details

With Daylight Saving Time ending in the US last weekend, Mark and Allen figured it would be a good... time... to take a look at what date and time features are available to voice developers on Alexa, Alexa APL, Google Dialogflow, Google App Actions, and Jovo. Not to mention discussing the role of ISO-8601 date/time formats, JavaScript libraries such as Luxon, and why UTC isn't always the right time zone.

Resources:

  • Luxon: https://moment.github.io/luxon/
  • Alexa Date/Time slots: https://developer.amazon.com/en-US/docs/alexa/custom-skills/slot-type-reference.html#numbers-dates-times
  • APL localtime: https://developer.amazon.com/en-US/docs/alexa/alexa-presentation-language/apl-data-binding-evaluation.html#localtime
  • Dialogflow system entities: https://cloud.google.com/dialogflow/cx/docs/reference/system-entities
  • Jovo Time Zone plugin: https://www.jovo.tech/marketplace/plugin-timezone

View Details

With Google's Universal Analytics sunsetting, everyone is looking at what other tools are available to record user activity in our skills and chatbots. At Allen's suggestion, Mark had checked out the BigQuery service offered by Google Cloud Platform, and created a Jovo plugin to make it available for everyone. We discuss BigQuery and how to use it and this new plugin.

Resources:

  • https://www.jovo.tech/marketplace/analytics-bigquery
  • https://cloud.google.com/bigquery

View Details

We've been exploring some new tools, concepts, and libraries - Mark has been looking at tools to do Named Entity Recognition, while Allen has been exploring the LangChain AI library. But this leads to the question - as you're just starting to learn something new, how do you do so? Do you start with example code? Or YouTube videos? Or what? We share what has worked well for us, along with some early discoveries about what we've been looking at.

View Details

If you're working on a #VoiceFirst application that requires Speech to Text in a controlled environment (for example, on an embedded device or in a medical environment), you don't want to have to rely on cloud processing such as those available from Google or AWS. Even if you are willing to use the cloud, it may be cheaper to run your own STT service. Mark and Allen discuss one such way to do so - using the Leopard STT product from Picovoice.

Additional info:

  • Serverless Speech to Text article: https://medium.com/picovoice/serverless-speech-to-text-5258e05f7031
  • Code to go with the article: https://github.com/Picovoice/serverless-leopard/blob/main/serverless_leopard/lambda_function.py
  • Product page from Picovoice: https://picovoice.ai/platform/cat/

View Details

One of our favorite #VoiceFirst features (and not just because of the name) has come to Alexa! Allen and Mark (the real Mark) discuss the new onSpeechMark event that is now available in APL, how it compares to the feature in the Google Assistant Interactive Canvas, and some ways that you can use it to make more powerful and dynamic APL displays.

View Details

In our previous episode, Allen and Mark talked about the differences between Dialogflow ES and CX and how both had a notion of front-end integrations. This week, they go in a little deeper, discussing the various ways you can write your own integration if you need to - either by sending text or audio to the Dialogflow API.

View Details

Many #VoiceFirst developers know of Dialogflow as a Natural Language Understanding (NLU) system. But is there more to it? Mark and Allen discuss what's different between Dialogflow ES and CX, what's the same, and how both of them provide added value to voice developers (as well as designers).

View Details

Did you know Google Chat has an API? That you can build bots for? And that it can even integrate directly with Dialogflow, PubSub, or Google's App Script? Allen introduces Mark to some of the unique and powerful features that the Google Chat API has, explores the various ways you can use it, and ponders some things we can learn when developing for #VoiceFirst.

View Details

How is a chatbot different from what we usually think about in the #VoiceFirst world? There are parallels, so Mark discusses some of his recent explorations in what is going on with Microsoft and support for chatbots and Allen compares to some other technology available to developers. There are some interesting updates and changes in progress!

View Details

More third party widgets are becoming available for Amazon Alexa, so Mark takes the opportunity to share what he can about how widgets work, how they go beyond a #VoiceFirst design, and how you can get started designing them to accompany your skills. Allen... has some choice facial reactions in response.

View Details

As we kick off the new year, Allen and Mark look ahead at what's coming (or what we hope is coming) in the #VoiceFirst world this year. From Amazon Alexa and the Google Assistant to custom agents and AI, what do we think the year ahead will bring? What are you looking forward to?

View Details

As the end of 2022 looms, Mark and Allen look back at some of the highlights of the show, of #VoiceFirst development, of working with Amazon Alexa and the Google Assistant, and of talking to all of you. 

View Details

Wishing you the warmest of holiday seasons from all of us at Two Voice Devs and the #VoiceFirst Community Choir.

Featuring

Jeff Blankenberg
Maaike Coppens
Jessica Earley-Cha
Pete Erickson
Lisa Falkson
Nick Felker
Allen Firstenberg
Dana Gibson
Tom Hewitson
Toni Klopfenstein
Cathy Pearl
Noelle Russell
Nick Schwab
Jon Stine
Mark Tucker
Denis Valášek
Sarah Wilson
withAmazon Alexa
andGoogle Assistant

Script

Mark Tucker
withChatGPT

Editing

Allen Firstenberg
withDescript

A Two Voice Devs Production

View Details

The biggest buzz the past few weeks have been about ChatGPT from OpenAI, with some folks in the #VoiceFirst community pondering how this is going to change the nature of conversation design and voice development. Mark and Allen talk about what works with ChatGPT, what doesn't, and how voice developers might be thinking about it's role with building conversational apps and platforms in the future.

Learn more:

  • ChatGPT: https://chat.openai.com/chat

View Details

The tech industry isn't always full of better and better things happening. Sometimes... there are setback, reassignments, poor sales, and layoffs. The recent news out of Amazon, particularly in the hardware and Alexa division, and with the Google Assistant have indicated that the #VoiceFirst movement may have hit some hard times. Allen and Mark talk about what's going on and extend our best hopes for those impacted.

Some further reading:

  • Details from VoiceBot.ai: https://voicebot.ai/2022/11/15/the-latest-details-on-the-amazon-layoffs-and-the-impact-on-alexa/
  • Thoughts from Nick Schwab: https://twitter.com/nickschwab/status/1597980202243657728
  • Thoughts from James Poulter: https://twitter.com/jamespoulter/status/1597320322633830400

View Details

As Google has been sunsetting conversational actions, they've been ramping up with support for the new App Actions for Google Assistant. Mark and Allen discuss what App Actions offer for both users and developers and how this compares and contrasts to skills for Amazon's Alexa and apps for Apple's Siri

View Details

We'd like to extend our thanks to so many people in the #VoiceFirst community, our jobs, our family... and you, our listeners.

View Details

Mark sits down with Brett Adler, developer of the award winning VoicePT skill, to discuss his background in software development, low code and no code tools, and how his personal experiences brought him to developing a multi-modal #VoiceFirst assistant that helps physical therapy patients do their exercises.

More about VoicePT at https://devpost.com/software/voicept

View Details

When you're seeing each other for the first time in 3 years, you make it an event! That's what Mark and Allen recently did when they met up at the 2022 Voice Summit in Alexandria, VA. And what better way to commemorate the event than to record a session of Two Voice Devs live in front of an audience? Why, to welcome questions from that audience! Hear our thoughts on #VoiceFirst topics from what developers need to real estate and urban planning in the DC metro region.

Our thanks to the VOICE2022 organizers for helping us with the production, including recording the session, and to our fantastic audience and their questions.

View Details

Not every technology we deal with in Voice is a #VoiceFirst technology, sometimes we need some "adjacent" skill. This week, Mark discusses some recent issues he had involving the validation signature that Alexa provides to skills that run outside AWS Lambda, and Allen provides his perspectives about how these same issues were addressed using Google's Dialogflow.

Resources:

  • https://developer.amazon.com/en-US/docs/alexa/custom-skills/host-a-custom-skill-as-a-web-service.html#check-request-signature
  • https://developer.amazon.com/en-US/docs/alexa/alexa-skills-kit-sdk-for-nodejs/host-web-service.html#usage
  • https://github.com/alexa/alexa-skills-kit-sdk-for-nodejs/commit/a1652383648e9e9da42b301aa033a4143f9cdf64
  • https://stackoverflow.com/a/62699910/1405634

View Details

Allen may be well known for coordinating his wardrobe with his Google Glass, but he's also as passionate about Glass as he is about Voice. With recent announcements coming from Google about Enterprise Glass v2, Mark asks Allen about what developers should be expecting for Glass' future. And where, exactly, does voice fit into that anyway?

More about Glass:

  • Glass Info: https://www.google.com/glass/start/
  • Developer Info: https://developers.google.com/glass-enterprise/
  • New updates: https://blog.google/products/google-ar-vr/bringing-more-of-googles-productivity-apps-to-glass-enterprise/
  • [Between] Advanced Wearables for the Enterprise: https://youtu.be/8g-GXFpYFgQ

View Details

Now that Mark and Allen have returned from #VOICE22, they take a look at what the summit was like a share a bit about their presentations. Allen talks about outents and how tools like Dialogflow and Multivocal assist with this concept, while Mark talked about developing a #VoiceFirst prescription system for Alexa.

View Details

What does it feel like to talk with a spreadsheet? That's the core question behind Vodo Drive (pronounced like "to-do"), one of Allen's big projects for the Google Assistant. After a demo, Allen and Mark discuss how Vodo Drive works, what it teaches us about building large #VoiceFirst software projects, and what the future holds as Vodo Drive needs to move to Amazon Alexa.

Learn more about Vodo Drive at VodoDrive.com

View Details

How do you learn new skills in the #VoiceFirst arena? How do you share what you've learned with others? A week before the Voice Summit, Mark and Allen share their experiences as authors, documenters, podcasters, and public speakers.

View Details

No matter if you're building an app for the iPhone or Android, a Google Action, or an Amazon Alexa Skill, there are guidelines that you need to follow to make sure your app is approved by the review team. Mark and Allen go over morerules that you should read before you start to develop your #VoiceFirst skill.

Check out the guidelines for Alexa: https://developer.amazon.com/en-US/docs/alexa/custom-skills/policy-testing-for-an-alexa-skill.html

View Details

No matter if you're building an app for the iPhone or Android, a Google Action, or an Amazon Alexa Skill, there are guidelines that you need to follow to make sure your app is approved by the review team. Mark and Allen go over some of the rules that you should read before you start to develop your #VoiceFirst skill.

Check out the guidelines for Alexa: https://developer.amazon.com/en-US/docs/alexa/custom-skills/policy-testing-for-an-alexa-skill.html

View Details

If you're building your own #VoiceFirst app outside the assistants, you'll also need to think about how you get input from people. Fortunately, there are a number of tools available from AWS and Google Cloud (and others) that will help you do this. Mark and Allen go over the raw technologies involved in Automatic Speech Recognition (ASR) and Natural Language Understanding / Processing (NLU / NLP), how they work (broadly speaking), and some thoughts on what needs to be done for the future.

Resources mentioned:

  • Google Cloud Speech-to-Text - https://cloud.google.com/speech-to-text
  • Google Dialogflow - https://cloud.google.com/dialogflow
  • Amazon Lex - https://aws.amazon.com/lex/
  • Jovo Keyword NLU plugin - https://www.jovo.tech/marketplace/plugin-keywordnlu

View Details

Not every assistant needs to be part of Amazon Alexa or the Google Assistant. What if you're developing your own voice assistant? How do you take care of some tasks like getting output to your users? In this episode, Allen and Mark give an overview of some of the technologies available to you to send audio exactly the way you want it to sound and some of the tools that are available to use.

Resources mentioned:

  • Speech Synthesis Markup Language (SSML) specification - https://www.w3.org/TR/speech-synthesis11/
  • Amazon Lex - https://aws.amazon.com/polly/
  • Google Cloud Text to Speech - https://cloud.google.com/text-to-speech
  • SSML Guru - ssml.guru
  • Speech Markdown - SpeechMarkdown.org
  • Jovo Marketplace TTS - https://www.jovo.tech/marketplace#tts

View Details

Mark's been on a roll recently, converting many of the utilities he's built as part of writing Amazon Alexa Skills and Google Assistant Actions into open source plugins for Jovo. This week, Mark and Allen discuss why these kinds of libraries are important, review Mark's latest plugin to generate random player names, and uses this as an example for how to build a plugin for Jovo.

View Details

Making sure our #VoiceFirst applications are written securely and use secure components is important. And when one of those components has a security bug, it is important that we update it as soon as we can. Mark highlights a recent security vulnerability in the node-forge module, which is used by the alexa-verifier-middleware module. Mark and Allen then discuss what the verifier does and how we can be careful when it comes to using libraries.

Some references:

  • alexa-verifier-middleware: https://www.npmjs.com/package/alexa-verifier-middleware
  • Alexa verification: https://developer.amazon.com/en-US/docs/alexa/custom-skills/host-a-custom-skill-as-a-web-service.html#manually-verify-request-sent-by-alexa
  • Issues with node-forge: https://github.com/advisories/GHSA-x4jg-mjrx-434g

View Details

Allen and Mark chat about the Amazon Alexa Skills Challenge: Aging & Engaging event that is currently in progress that offers cash prizes to developers for creating #VoiceFirst skills that are targeted to the over 55 crowd. Mark is a judge, so has to be a little reserved in what he says, but there's lots of discussion about tools that teams can use to participate and tips for creating skills that have a chance of winning (and are good quality skills!).

Learn more: https://alexaskillsaging.devpost.com/

View Details

Guest Host Craig Walls joins Allen to discuss his latest book on #VoiceFirst development for Amazon Alexa, Build Talking Apps for Alexa, as well as what went into his skill to help Disney Theme Park explorers - Mouse Guests.

View Details

Mark goes over the latest open source package that he's developed with his company, RAIN, for the #VoiceFirst community. This allows developers to build a CMS system using Sanity that provides content through Jovo. Allen and Mark discuss why these sorts of systems are important, how they improve Voice-based systems, and some future improvements to improve performance.

Links:

  • https://www.jovo.tech/marketplace/cms-sanity

View Details

Last week, Amazon Alexa ran their annual developer event, Alexa Live, where they showcased a number of new #VoiceFirst features for Alexa. Mark and Allen take a look at the list of announcements for Skill developers and give some first thoughts on what they mean.

Links:

  • Feature Roundup: https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2022/07/Alexa-live-feature-roundup-july-2022
  • Skill Developer Announcements: https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2022/07/Alexa-live-announcement-roundup-skill-developers-july-2022

View Details

It's our Second Anniversary show! In addition to a few kind words from our friends, Mark and Allen celebrate by sharing what goes on behind the scenes when creating an episode. How do we come up with #VoiceFirst topics? What is our recording setup like? How do we make transcriptions? Why does Allen always wear a blue shirt? How do you get an episode out every week?

Thank you, everyone, for your questions and your support as we enter our third year!

Mentioned:

  • Descript - descript.com
  • gimp - gimp.org

View Details

Open Source components can be incredibly useful when building your #VoiceFirst Skill or app. Allen asks Mark about his latest Open Source project, Badgerific, which is aimed at rewarding users tokens or badges under certain, specific, conditions.

Badgerific: https://github.com/rmtuckerphx/badgerific

View Details

Based on a question from StackOverflow, Allen and Mark explore the question about how to handle a #VoiceFirst scenario where the user needs to provide a list of several items one at a time (a food order, for example). If we're expecting the user to fill in a slot multiple times, or trigger an intent multiple times, before saying they're ready to move on - how do we make sure this intent gets handled over and over? Mark offers ways to handle this using Jovo and the Alexa Skills Kit, while Allen explores doing this with Dialogflow ES and CX, and ponders if Alexa Conversations will help.

View Details

With Mark and Allen's plans to explore some new tools and features for #VoiceFirst developers, we discuss what we've been playing with for the past week. Mark talks about the Dialog Management API in VoiceFlow and how it can be used by Amazon Alexa developers as a way to easily design and deploy skills, while Allen takes his first steps into re-learning Android development for App Actions.

More information:

  • https://developer.voiceflow.com/reference
  • https://developer.android.com/guide/app-actions/overview

View Details

On the occasion of our 100th episode, and with big changes in the #VoiceFirst industry from Google and beyond, Mark and Allen take a moment to look back on where voice has been, and what new developer tools and platforms we'll be looking at for the next 100 episodes.

View Details

Amazon has introduced an update to the Alexa Presentation Language, which allows #VoiceFirst developers to create more dynamic visuals. Mark and Allen discuss what these changes are and why they matter.

View Details

Based on a question from Will Rongholt, Mark and Allen discuss different approaches to implementing a #VoiceFirst game for Alexa skills or Actions on Google that provides a limited number of levels, after which players will need to make a purchase of some sort.

View Details

As we do #VoiceFirst development for platforms such as Amazon Alexa and Actions on the Google Assistant, we often find patterns in how we should interact with users between sessions, and the information we need to store to keep track of these interactions. Mark continues demonstrating to Allen the next two tools in the library: a streak counter and a recharge counter.

References: 

  • jovo.tech
  • https://github.com/jovo-community/jovo-community-plugin-tools

View Details

As we do #VoiceFirst development for platforms such as Amazon Alexa and Actions on the Google Assistant, we often find patterns in how we should interact with users between sessions, and the information we need to store to keep track of these interactions. Mark shares with Allen a library he has begun for Jovo 4 to assist with some of these patterns and goes into detail about the first of these - a way to manage a randomized list of items so our visitors get a new item each time.

References:

  • jovo.tech
  • https://github.com/jovo-community/jovo-community-plugin-tools

View Details

It's that time of year again! Google I/O! When Google rolls out the latest tools and products for developers and everyone else to use. While Allen couldn't be there in person this year, he answers Mark's questions about what Google is doing in the #VoiceFirst space in general and with the Google Assistant, Actions on Google, App Actions, and the Matter protocol specifically.

View Details

With a basic understanding of how multivocal helps you create a configuration-driven #VoiceFirst app, Allen and Mark discuss how a configuration-driven content management system for multivocal ties it all together for designers, developers, and content creators.

View Details

Allen has been talking about multivocal for years, but what is it, and what does it do for developers? He and Mark talk about some of the underlying concepts about the multivocal library, how #VoiceFirst developers use it with platforms such as Amazon Alexa and Actions on Google, and how it can use Firebase Firestore for configuration and what this means as part of a content management system.

View Details

Recently, AWS announced that their Lambda functions could now be invoked directly via HTTP, rather than having to go through another AWS service to access it. Mark and Allen compare this (and Lambda in general) to Google Cloud Functions, which offer similar features, and why these are both important tools in the #VoiceFirst toolbelt.

View Details

In the last episode, Mark demonstrated some of his work on Jovo that would lead towards a #VoiceFirst CMS integration, and Allen commented "now all you need is a UI to manage the content". This week - Mark delivers by exploring how to use Sanity.io, a "headless CMS" to build the UI that will feed into the configuration.

View Details

Ever have one of those weekends where you have a programming idea, and you just can't get it out of your head, so you decide to code it? We all have! Mark had some ideas for how to create a Content Management System (CMS) built on top of Jovo, and he had to call Allen to share the idea and some early experiments with it. Along the way, we discuss some of Jovo's core concepts and set the stage for a #VoiceFirst CMS.

View Details

Based on a question from Dana Gibson, Mark and Allen discuss some of the complexities when it comes to building Actions on Google for Assistant or Alexa Skills that access a database. From dealing with the asynchronous operations to making sure we complete the queries as quickly as possible, from SQL to tools like Firebase, Dynamo DB, and Air Table, we explore some tips and traps about databases in a #VoiceFirst world.

View Details

An important feature in Account Linking with both Amazon Alexa and the Google Assistant is how they both integrate with existing authorization systems using the OAuth2 protocol. But OAuth2 can be tricky sometimes, as Mark and Allen relate with some tales of woe as they've tried to integrate OAuth into their #VoiceFirst skills and actions. Definitely some weird stuff happening!

View Details

With another week to ponder the implications (and implementation), Mark and Allen are both pretty excited to discuss some thoughts around what a #VoiceFirst CMS might look like and how it would work with the Actions for Google Assistant and Amazon Alexa. Specifically, we dig a little bit into what Multivocal and Jovo are currently doing to make things easier on developers and content creators and what more needs to be done.

View Details

Allen and Mark continue their conversation about Content Management Systems in a #VoiceFirst world and how developers could use a CMS to both simplify their development, empower the content authors and conversation designers, and lower the cost of updating Google Assistant actions and Alexa skills.

Some products discussed:

  • sanity.io
  • Graph CMS
  • Leximic
  • multivocal
  • Jovo

View Details

When it comes to #VoiceFirst applications, we often say that "content is king". But creating great conversations are also part of the content, and require close collaboration between the content author, the conversation designer, and the developer. Frequently updating the content can make for a better Alexa skill or Google Assistant action, but could be complicated. Mark and Allen discuss how (and why) to make this easier using a voice-oriented Content Management System (CMS).

Some products discussed:

  • sanity.io
  • Graph CMS
  • Leximic
  • multivocal
  • Jovo

View Details

What #VoiceFirst technology would we use to build an Alexa Skill or Google Assistant Action that allows us to chat with our friends? That's the question that Allen and Mark try to answer. Along the way, we learn about various technologies, tips, and tricks that we can use across an assortment of voice apps that we may build. (But should we even try to make this?)

View Details

Let's get this #VoiceFirst app going! Mark and Allen chat about some of the complexities when it comes to starting your Google Assistant Action or Alexa Skill - either as the start of a conversation, through a "deep link" invocation, or as a "one shot" question or command. While it seems like it should be straightforward, things aren't as easy as they seem! What tricks and tools do we have at our command?

View Details

Frequent listener JT asked us to explain what "modality" means. Good question, JT! Allen and Mark tackle the question to try and explain it, how it is important in #VoiceFirst development, and discuss some thoughts about where this important consideration may shape our Google Assistant actions and Alexa skills in years to come.

View Details

Mark and Allen take a look at one of the foundational technologies for Alexa Skills and Google App Actions (and all of the modern web) - HTTP. Although originally standing for the HyperText Transfer Protocol, these days it has grown to support a wide range of uses, including the REST protocol, which many #VoiceFirst apps use to make information available for users.

View Details

Mark is joined by guest host Eliza Camber, an Android developer and Google Developer Expert for the Google Assistant, to discuss what App Actions for the Google Assistant on Android are, how they bring the power of #VoiceFirst features to new and existing Android apps, how they compare to Alexa's mobile features, and how mobile devices can help future assistants with the thorny question about context.

View Details

Voice AI technology is popping up in all sorts of different places, but one place that has gotten lots of news recently has been seeing #VoiceFirst processing at fast food drive-throughs. Allen and Mark ponder this from a number of different angles, both technological and legal, and explore where developers play a role as this becomes more widespread.

View Details

Ever have one of those days (or weeks) as a developer where things don't just go wrong, they go badly wrong? Allen and Mark talk about some of their worst experiences as developers, from annoying bugs to accidentally deleting databases, and how they recovered. Most importantly - they remind us to remember the successes - not just the failures.

VoiceFirst

View Details

Mark and Allen discuss what being a senior Software Developer looks like, for them, on a day-to-day basis, and compares how it may differ for #VoiceFirst and more conventional software projects.

View Details

A surprising amount of major #VoiceFirst development requires accessing other resource files your Alexa skill or Google Assistant action may need. From audio files to your privacy policy, there are all sorts of files that need to be available, and it can sometimes be confusing when they're not. Allen and Mark discuss some of the most likely scenarios and where to start looking for solutions.

View Details

Mark and Allen are joined by friends from around the world via Twitter Space in a live episode to discuss what 2021 looked like for Amazon Alexa and Actions on Google Assistant development, and what 2022 looked like in the #VoiceFirst arena. 

View Details

Happy Holidays, everyone! Mark and Allen take a quick look back to see what gifts the #VoiceFirst developer community has gotten for Amazon Alexa and the Google Assistant.

View Details

While your first experiments with #VoiceFirst development may be a solo project, bigger and more elaborate projects will be done as part of a team. While we've often talked about the role of conversation designers as part of that team - what are the other team members? Allen and Mark share our experiences being part of different teams, both in how they work and how large they are, and what roles, from PM to Help Desk (and payroll!), are needed to make high quality and long lasting Amazon Alexa skills or Google Assistant actions.

View Details

Mark and Allen continue their dive into some of the words we use as we develop #VoiceFirst.

In this episode:

  • Jovo
  • Multivocal
  • ASR
  • NLP
  • NLU
  • Intent
  • Slot
  • Slot type / Entity / Entity Type / Custom Entity
  • Alexa Conversations
  • Action Builder scenes
  • Context
  • Fallback
  • No match
  • In Skill Purchases
  • Account Linking
  • APL
  • Display Templates
  • Cards
  • Web API for Games
  • Interactive Canvas

View Details

We use a lot of strange terms in the #VoiceFirst world, so Mark and Allen start diving into some of what them mean and what they mean to us as developers.  

In this episode: 

  • Smart Speakers
  • Smart Displays
  • VoiceFirst
  • Ambient
  • Ubiquitous
  • Amazon Echo
  • Google Home
  • Amazon Alexa
  • Google Assistant
  • FarField microphone
  • Wake word
  • Persona
  • first party (1P)
  • third party (3P)
  • Skills
  • Actions
  • Capsules
  • Voice apps
  • App Actions
  • Conversational actions
  • Dialogflow v1, v2, ES, CX, API.AI
  • Action Builder
  • gactions
  • Console
  • Webhook

View Details

We just want to take a moment to thank members of the #VoiceFirst community, those who make developing for Alexa and the Google Assistant better, and especially you, our listeners, for being here during the past year.

View Details

The concept of an Intent in the #VoiceFirst world seems straightforward - it is what a user is trying to express. But how Amazon Alexa has implemented the concept is slightly different than how the Google Assistant and Dialogflow have. Allen and Mark explore some of these differences as Allen works to prepare multivocal, a development library, for use with Alexa.

Find out more:

  • multivocal.info

View Details

It is difficult to believe that the Google Home launched just over 5 years ago, and Alexa just celebrated its 7th birthday. Allen reminisces about his first steps writing for the Google Assistant with info about how he created a #VoiceFirst presentation, where his voice changed the slides, and how things have evolved since.  

Learn more: 

  • http://ifttt.com
  • http://slides.com
  • http://PubNub.com
  • https://firebase.google.com/docs/database

View Details

Ever have an idea that you just can't shake? Mark had an idea for an APL processor based on the PostCSS processor, but it wasn't quite working out the way he expected it to, so he and Allen chat about it a bit. But in between when they first discuss it, and when they return to record another episode - he's resolved the problems and released it. Get some insight into both phases of this process.

More about PostAPL: https://github.com/postapl

VoiceFirst

View Details

Sometimes you get into a funk, and you need a little bit of self-care to just deal with a week. Maybe you like talking to a friend when you're in that kind of mood. For Mark and Allen, a chat is just the sort of thing to lift their spirits. Even more so when they're talking about coding and how to tackle a design problem for one of the #VoiceFirst open source projects they work on - Speech Markdown. Let's peek in on their conversation to see how they explore and resolve the issues about adding a new feature.

Learn more about Speech Markdown: speechmarkdown.org

View Details

Mark and Allen continue talking about the most recent hardware announced from Amazon and how this may mark a change, for both Alexa and the Google Assistant, towards a more widget-focused environment. What do you think this means in a #VoiceFirst world?

View Details

Mark is joined by fellow Alexa Champion Darian Johnson to talk about Darian's various hardware projects, from a smart mirror to an Enterprise computer, and the latest hardware announcements from Amazon and Alexa, including the Echo Show 15 and their robot Astro.  

For more about Darian and his projects:

  • Pre-Launch Page for the Newt: https://www.crowdsupply.com/phambili/newt
  • Newt technical blog: https://hackaday.io/project/178328-newt-always-on-low-power-digital-assistant
  • Smart Candle Project (commercial product crowdsourcing is launch in Q1 2022) - https://www.hackster.io/darian-johnson/scent-terrific-smart-candle-d8c68a
  • Star Trek Display - https://www.instructables.com/Make-It-So-Star-Trek-TNG-Mini-Engineering-Computer/
  • Twitter: https://twitter.com/darianbjohnson

View Details

Amazon has been going all-in with Alexa Presentation Language recently, and Mark has been taking a deeper dive into it and working on some custom components. Allen and Mark discuss some of the basics of APL.

Sites mentioned: apl.ninja

View Details

That's "Phonemes And More", as in the SSML Phoneme tag, and some other tags that are now available for the Google Assistant. Allen and Mark discuss how these tags are useful for developers when trying to create great responses that sound "just right". 

Websites mentioned: ssml.guru

View Details

Mark has updated his Picture Guesser skill with some new features, and he and Allen discuss what went into these new features. What developer components do you use to "plus up" your skills and actions?

View Details

Do your messages and prompts to your users just drone on and on? Are people begging your voice agent to stop talking and get back to work? Do you fall asleep testing your skill or action? Mark and Allen discuss how to make messages and prompts more exciting!

View Details

Inspired by a design presented at VoiceLunch US/Canada, Mark and Allen discuss how we would implement the design - consisting of keeping a "score" about user's to our skill or action, adjusting the score sometimes, and presenting different prompts based on this score. While the design seems straightforward, there are a number of interesting approaches  to the implementation.  

How would you implement this design?  

What libraries do you want to see to help make your implementations easier?

View Details

Both the Google Assistant and Alexa have routines - ways you can create your own phrases to do things. But what do developers need to do to make routines better for their users? Mark and Allen explore the differences between the two platforms and what developers need to know.  

Pages mentioned: 

  • https://amzn.to/38zzFT9

View Details

Most voice skills and actions will need to work with a database or contact another service, and the most common way to do that is with an API. Web API calls are pretty common, but there are some tricks and issues when it comes to voice. Mark and Allen discuss their approaches to using APIs in their apps.

View Details

Conversations don't always go to plan, and your conversation designers may specify how to handle error recovery. But what are the basic tools we have to do so? Allen and Mark discuss what code to write to handle when things go wrong.

View Details

When we have a list of information to display on the web or on a mobile device, there are a variety of tools that we can use to manage that list. But when it comes to voice - how do we make sure that list sounds correct (and something a user will listen to)? Allen and Mark discuss the tools that developers have to manage lists.

View Details

Recently, Amazon announced the latest class of Alexa Champions, and Google announced the first GDEs for a new project category. Outstanding achievements for those named... But what are the Alexa Champions and Google GDEs? Allen (a GDE in five program areas) and Mark (an Alexa Champion and Bixby Premiere Developer) discuss what these programs are and how GDEs and Champions are expected to give back to their communities.

View Details

At "Alexa Live" last week, Amazon bombarded us with a pile of new features that developers will soon be able to take advantage of. Mark and Allen talk about which features jumped out at them, and what they're looking forward to learning more about.

Which features are you most interested in?

https://developer.amazon.com/en-US/alexa/alexa-live/release-roundup

View Details

Mark and Allen, with a little help from our friends, celebrate one year of Two Voice Devs by looking back at some of the things that have stood out for us in the past year.  

Most of all, however, we want to thank all of you for watching, listening, and sending us feedback. Here's to the next year!

View Details

Things change, sure, but what happens when the platform you've developed for deprecates and removes a feature you were using? How do you adjust? Allen and Mark discuss the impending removal of Alexa Display Templates and other technologies on Alexa and the Google Assistant that have changed over the years.

View Details

Our 50th episode! Amazing! Thank you everyone who has been part of this along the way.

We figured this would be a golden opportunity to launch an Action, which we've now done. Mark and Allen chat about what inspired "Talk to Two Voice Devs", how it is more interactive than a typical podcast, how some of it works, and how this is just the first step.

What would you like to talk with a podcast about? We'd love to hear your thoughts and comments.

View Details

You've designed the conversations, you've written the code, everything is tested. You can release your skill or action now publicly, right? Well, not quite yet. First you have to get past the review and certification teams at Amazon and Google. Mark and Allen discuss this final step in the development process and what it means for you. Do you have any good tales about the process? Any nightmare stories?

View Details

This week, Roger Kibbe guest hosts, along with Mark, to discuss what he does as a Developer Evangelist for Samsung's Bixby, how Bixby is different, what he does with the Open Voice Network, and his take on the voice industry in general.

View Details

Experience helps you to not make mistakes. But how do you get experience? By making mistakes. And between Mark and Allen, they've made plenty of mistakes in years of programming. They'll talk about some of the lessons they've learned along the way, particularly for voice. What lessons have you learned that you want to share?

View Details

Eventually you're going to want to monetize your Google action and one way to do that is through in-Action transactions. Mark and Allen go over the different kinds of Digital Goods Transactions available and how to use them. (If you're looking for how to do this on Alexa - check out the previous episode.)    For more about digital goods transactions, see https://developers.google.com/assistant/transactions

View Details

Eventually you're going to want to monetize your Alexa skill, and one way to do that is through In Skill Purchases (ISP). Mark and Allen go over the different kinds of ISPs available and how to use them. (If you're looking for how to do this on the Google Assistant - tune in next week.)

For more about In Skill Purchases, see https://developer.amazon.com/en-US/docs/alexa/in-skill-purchase/isp-overview.html

View Details

Now that Google I/O has been completed, Allen and Mark discuss what new features have been delivered for the Google Assistant... and what has just been promised. What were your favorite moments from Google I/O, and what new features are you most looking forward to?

View Details

This week is Google I/O, where Google regularly takes the opportunity to release what's new and coming "soon" in the Google ecosystem. Mark and Allen take the opportunity to review what new features have recently come out for developers of Alexa skills and actions for the Google Assistant. What new features of these platforms have you been building for?

View Details

In episode 38, we talked about the general process about handling users when they start our skill or action. But what do we do in the specific case of the new user? How can we onboard them? Allen and Mark talk about what our conversation designers may be asking of us, and the data structures and tools we can use to implement their ideas.

View Details

Do your skills or actions work on the first try? Of course they don't! That's why we have to find good methods of tracking down and squishing those bugs. Mark and Allen discuss some of their tips for figuring out what is wrong when, on those rare occasions, things don't quite work as expected.

View Details

In a lot of ways, #VoiceFirst development is like any other programming. But not quite. There are always some good things we need to remember as we build and test our skills, actions, and capsules. Mark and Allen talk about the best practices they follow when starting a project. Do you have any tips and tricks you make sure you follow to make your development life easier?

View Details

We've gotten a number of questions about developers from other fields getting started in voice. What do Ilarna and Allen think about that? Can it be done? You bet! Let's learn more.

View Details

Make sure you get your conversation off to a good start with a good welcome message. While you'll rely on your conversation designer to make it sound right, Mark and Allen discuss how to implement all different sorts of welcome messages, and why you may need different approaches at different times.

View Details

Keeping count of things, how often a user has asked for help or visited your skill/action, can be a useful tool conversation designers want. But how can you code that? And how do you report that value when you're talking to a user? Allen and Mark discuss several tips and tricks about how they keep track of counters using Jovo, multivocal, and other toolkits, and the best ways to present this information to your users.

View Details

Implementing "repeat" is one of the Intents that the review teams from both Amazon Alexa and the Google Assistant require. But they don't really give guidance about how to design or implement that Intent. Mark and Allen discuss the various approaches that they take with the Alexa Skills Kit, Assistant Library, Jovo, and multivocal.

View Details

Did you notice that Mark was nominated for a Project Voice award for game development? To celebrate, we chatted about the tools he used to create his games and what kinds of things you can do to make a good voice-first game.

View Details

We complete (for now!) our review of display technologies for voice. Mark dove into a project using Vue and Web API for Games for Alexa, and he compares his experiences with Allen's recent presentation about using the Interactive Canvas for Actions on Google with React. Buzzword Bingo! How do the two compare and contrast?  

Mark's project with Vue and Web API for Games: https://github.com/rmtuckerphx/jovo-web-vue-starter-ts 

Allen's project with React and Interactive Canvas: https://github.com/afirstenberg/interactive-canvas-react 

Vue: https://vuejs.org/ 

React: https://reactjs.org/

View Details

Allen has some questions about the Alexa Presentation Language (APL), so who better to ask than one of the contributors to apl.ninja, Stuart Pockington, who joins as the guest host this week. We go through a lot of details about APL, some of which blows Allen's mind.

Stuart adds this clarifying note: "For one question Allen asked about importing the APL document, I answered in a way that implies that I drop the json directly into the directive. That’s not what I do and I’d not recommend others to do it. Instead I save the APL JSON file in my project and reference that in APL directive. That gets deployed along with all my other backend code in lambda."

View Details

What kind of built-in Intents do Alexa and the Google Assistant provide to developers directly? How do they differ from each other, and what kind of problems do they cause? And what are Events and how do they differ? Allen and Mark dive into this weeks tongue twister, (but don't worry, it doesn't get too intense).

View Details

Five to seven seconds. That's how long you have for your Alexa Skill or Google Assistant Action to reply to a request from the user. And while that doesn't seem very long, if you're waiting for a reply, it can seem like forever. Mark and Allen discuss how important it is to shave milliseconds off your processing, and various techniques to do so.

View Details

We received a question from Rebecca Evanhoe asking if there was a way to determine the features of our assistant device for a conversation. Mark and Allen dive into the various ways we can figure that out, and some of the "gotchas" that can come with it.

View Details

Following on from our previous episode about debugging, Allen and Mark discuss the related topics of analytics. Although there are tools from Alexa and Google, there are also some third party tools such as those from Dashbot, and the new Jovo Inbox. Along the way, we discuss potential performance pitfalls, and how to avoid a sonic Blue Screen of Death.

View Details

Debugging is the bane of most developers, but it can be particularly tricky when it comes to voice - between the remote nature of it and the rapid response time required, it can be difficult to capture what is going wrong. Mark and Allen discuss the various tools and tips we have at our disposal to track down those pesky bugs.  

Some tools mentioned: 

  • Bespoken - bespoken.io
  • Sentry - sentry.io
  • Mocha - mochajs.org
  • Chai - chaijs.com
  • Assistant Conversation Test Tool - https://github.com/actions-on-google/assistant-conversation-testing-nodejs

View Details

For our first guest host, we're thrilled to welcome Ilarna Nche! She chats with Allen about how she got started developing with voice, some of her insights about what it takes to create a successful voice game, and what we should be thinking about next.

View Details

Allen thinks that the Interactive Canvas feature on the Google Assistant is one of the best technologies it supports, but Mark has a few questions about how it works. Supporting most of your favorite HTML/CSS/JS technologies in a Chrome-like environment, how would you enhance your Actions? What questions do you have about using it?

View Details

Breaks are good times to get some coding done, so what have Allen and Mark been up to recently? Turns out, there are some updates with Speech Markdown and Multivocal, and they explore these changes and the code that went into them.  

Speech Markdown: speechmarkdown.org 

Multivocal: multivocal.info

View Details

Happy New Year! At least we hope it is. Mark and Allen take a glimpse at the year ahead and discuss what we hope the new year will look like for voice first developers. (Without getting TOO absurd!)

View Details

With 2020 finally coming to a close, Allen and Mark look back at the past year - both personally and professionally.

View Details

Merry Christmas and Happy Holidays, everyone! Mark was busy working on his new game, Snatch Words, and Allen provided some help getting it working with the Google Assistant. Hear what that means when it comes to visual lists, security tokens, and more.

View Details

Have you played Snatch Word or Cross Talk yet? Mark and Allen chat about what it's taken to get their recent games developed and a bit about the underlying technology and code.

View Details

Mark is getting ready to release his new game, Snatch Word, but there are a couple of details to work out. So he and Allen chat about what it's taking to get things ready for release.

View Details

We often talk about skills and actions being "all about the content", but how do we manage that content if its so important? In a content management system, of course! Mark and Allen discuss their experiences using CMS tools and the crucial role they play in building practical skills and actions.

View Details

In the US, today is Thanksgiving. Allen and Mark would like to thank all of you for watching and listening, but also some people who have meant so much to us.  

Just a small list includes: 

  • Our families and closest friends
  • Michael Palermo
  • Brad Abrams, Jessica Early-Cha, and Mandy Chan
  • Bradley Metrock
  • Octavio Menocal
  • Gene Homicki, my coworkers, and everyone at MyTurn.com and spiders.com
  • Karol Stryja and Michal Stanislawek for VoiceLunch.com
  • Noelle Silver
  • Denis Valasek and Linda Lawton
  • Maike G and Rebecca E

And everyone else we name in the show or who have helped us so much this year. Know that we are thankful for each and every one of you. You matter to us and to everyone.

View Details

Building for voice is more than writing a simple program and you're done. A good voice skill or action has many components that work together. Mark and Allen discuss what some of those components are, how they integrate, and what you should think about as you write them.  

Number Spies System Components: https://markvoicedev.medium.com/creating-an-alexa-game-number-spies-system-components-overview-41bf142d0b3c

View Details

Even before you start with a blank editor, you're faced with coming up with the idea. When it comes to Voice - what inspires us? Allen and Mark talk about where our ideas come from and how they start to shape our #VoiceFirst experiences.  

Mark writes about what inspired him to create Number Spies: https://markvoicedev.medium.com/creating-an-alexa-game-the-spark-of-inspiration-for-number-spies-7f2b5a073a41

View Details

With all the confusion about Daylight Saving Time transitions finally behind us, Mark and Allen discuss all sorts of ways to handle time on Alexa, the Google Assistant, and Bixby. (And some tools and tips that make it easier!)

View Details

Where we are now has been shaped by our past. In light of this, Allen and Mark discuss how we got to this moment. What technologies and jobs have we held in our careers, and what lessons have they taught us that have helped us when it comes to voice.

View Details

Mark and Allen discuss the tools they use to build Skills and Actions, and some of the tricks to make a developer's life a little easier and more productive.

View Details

What happens when new features are released? Why, Mark and Allen break into song! And discuss the latest new features developers for Alexa and Google Assistant can work with .

View Details

Google Assistant

  • Console: https://console.actions.google.com/
    • Main docs page: https://developers.google.com/assistant
    • Codelabs: https://developers.google.com/assistant/conversational/codelabs
    • Community forum: https://www.reddit.com/r/GoogleAssistantDev/
    • Stack Overflow: actions-on-google and actions-builder tags
    • twitter: @ActionsOnGoogle
    • Actions on Air video / podcast series
    • Follow other GDEs - many have tutorials about various topics.

Dialogflow

  • ES Docs: https://cloud.google.com/dialogflow/es/docs
    • ES community Forum: https://groups.google.com/g/dialogflow-essentials-edition-users
    • Stack Overflow: dialogflow-es and dialogflow-es-fulfillment

Alexa

  • Developer Docs: https://developer.amazon.com
  • Alexa Skills Kit blog: https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit
  • Dabble Lab: https://dabblelab.com/
  • Voice First Tools: https://voicefirst.tools/
    • APL Simulator: https://tools.alexaskills.dev/

Jovo

  • Main page: https://www.jovo.tech/
    • Docs
    • Tutorials
    • Marketplace
    • Forum
  • GitHub: https://github.com/jovotech
  • Jovo Community: https://github.com/jovo-community

View Details

For our milestone episode we wanted to change it up a little. So we present our Top Ten issues that we see with the major platforms as developers.

View Details

We have something special planned for our 10th episode! Curious what it might be?

View Details

Authentication and Authorization are some of the more difficult concepts that most developers end up having to deal with at some stage. Mark and Allen discuss the high level concept of Account Linking - connecting your auth system to the voice agents auth system. Alexa and Google Assistant offer some tools to help with this, and explore how some of the tools are similar, but others offer significantly different experiences for both users and developers.

View Details

As developers, the more information we can get about people talking with our skills or actions, the better the conversation will be. But privacy is a serious issue! (And one the platforms take seriously, too.) How does Alexa and the Google Assistant balance our need for more information, and the need for privacy? And how can we ask for permission to get the information? There are surprising differences and similarities that Allen and Mark explore.

View Details

We never know where our conversations go, sometimes. This time, Mark and Allen chat about Intents, Slots, Types, Entities, Parameters, and the whole conversational model built around them, especially the slight differences between how Actions and Skills have to treat them.

View Details

Just because our skills and actions are Voice First doesn't mean they are voice Only. Alexa and the Google Assistant have a long history of supporting displays in addition to the audio interactions. Mark and Allen dive into all the visual options available for Alexa, Assistant, and Bixby and the interesting differences between the three platforms.

View Details

Did you notice some Actions were having problems last week? Allen and Mark certainly did! So this week, we're talking about what seems to have caused the outage, how this fits in to the overall storage capability for Actions, and how Alexa and Jovo approach session and user storage.

View Details

In an audio-first environment, you want to sound like a movie or TV soundtrack... but with interaction and dynamic responses. With Google's flavor of SSML and Alexa's APLA, you can create these responses. Mark and Allen explore how these two methods are similar, and where they differ.

For more info:

  • Google's SSML "par" and "media" tags: https://developers.google.com/assistant/conversational/ssml#par
  • Nightingale SSML editor: https://actions-on-google-labs.github.io/nightingale-ssml-editor/
  • Alexa's APLA: https://developer.amazon.com/en-US/docs/alexa/alexa-presentation-language/apla-interface.html

View Details

We both have open source projects that we contribute to in the voice community. We talk about our two top ones, Speech Markdown and Multivocal, what they are, and how we feel they're contributing to the growing #VoiceFirst environment.  

Learn more: 

Mark's Projects: https://github.com/rmtuckerphx

  • http://ssml.guru/
  • https://www.speechmarkdown.org/

Allen's Projects: https://github.com/afirstenberg

  • https://multivocal.info/

View Details

Mark and Allen chat about tools we use to build conversations for Alexa and the Google Assistant. Ranging from new tools, such as Alexa Conversations and Google's Actions Builder, to more mature tools, such as Jovo and Dialogflow. We got so excited about the topic, we just couldn't stop!

References:

  • Alexa Conversations: https://developer.amazon.com/en-US/blogs/alexa/alexa-skills-kit/2020/07/introducing-alexa-conversations-beta-a-new-ai-driven-approach-to-providing-conversational-experiences-that-feel-more-natural
  • Actions Builder: https://developers.google.com/assistant/console/builder
  • Jovo: https://www.jovo.tech/
  • Narratory: https://narratory.io/

View Details

Learn about Action Links for Google Assistant and Quick Links for Amazon Alexa. A comparison of the features for each voice assistant platform.

Links:

Action Links - https://developers.google.com/assistant/engagement/action-links

Quick Links - https://developer.amazon.com/en-US/docs/alexa/custom-skills/create-a-quick-link-for-your-skill.html