Tag Archives: chatgpt

How I Vibe-Coded My Own E-Reader With GitHub Spark

13 Sep

T-minus 4 weeks to the public launch of my book Marketing Plan for Tech Startups!

As you can imagine, I’m in the middle of final edits, layout decisions, and preparing content for every format readers might want.

There will be a paperback. There will be a digital edition on Kindle.

But of course, that wasn’t enough for me. I also wanted my own e-reader. One that actually had the features I always wished an e-reader offered.

So I vibe-coded one with GitHub Spark.

Inspired by Priyanka Vergadia’s demo showing how she built a full-stack app in minutes with GitHub Spark, I gave it a try. Spark is GitHub’s new AI-powered app builder that runs entirely in your browser. No setup. No config. No need to remember Java classpath from my mobile and web app developer days. 😉

Just describe what you want, and Spark builds it end-to-end: front-end, database, authentication, etc. As always, Priyanka did an awesome job walking her YouTube channel viewers through all the steps of using GitHub Spark to go from zero-to-app, so I thought: why not?

The PRD aka my wish list for a book reader

I mostly wanted three things:

  1. A two-page view so if you read on a big screen it feels like an actual book in front of you.
  2. A search function so you can instantly jump to “positioning,” “pricing,” or “Anthropic case study.”
  3. Bookmarks and notes, so readers can mark sections and write down thoughts as they read (my paperback margins are always full of notes and post-its 😉)

Three features I dreamed up, let’s see what I got.

Article content
GitHub Spark

How GitHub Spark turned my PRD into a working e-reader

I typed my requirements in natural language, hit submit, and Spark went into “think mode.”

A few minutes later, I had a working prototype: two-page display.

A couple of hours later (and with a few vibe-coding-hacks I’ll detail below) I added keyword search and a bookmark system. Here’s the finished product:

Article content
My e-reader vibe-coded in an afternoon

My e-reader vibe-coded in an afternoon looks very promising but is not quote ready to ship just yet. Here’s why:

Lessons learned from vibe-coding an e-reader in GitHub Spark

First, while Spark gave me the basic app scaffolding quickly, it struggled to render a PDF heavy with graphics. Sometimes it showed only text, other times it spit out binary data.

Spark’s default PDF handling just wasn’t built for a manuscript like mine. My book isn’t a typical wall of text. I wrote it in Google Workspace Slides to make it as much a tool as a book, packed with frameworks, diagrams, and visuals that startup founders and marketers can apply right away. The format was deliberate: keep the text lean, rely on visuals, and use slides as a constraint so every word carries weight.

I knew from a previous vibe-coding session that v0 by Vercel could handle a heavyweight manuscript like mine, so I thought: why not ask Vercel how it did it? The answer was pdfjs-dist, the distributable version of Mozilla’s PDF.js, which renders PDFs natively in the browser without plugins. I plugged it into Spark and—yay—I was unblocked!

Article content

Second, as I layered on more prompts and features, I learned that Spark projects can hit limits and stop accepting prompts.

Article content

The first prototype was quick and easy, refining it took patience… and some help from ChatGPT. When Spark stopped accepting prompts, I pulled down the GitHub ZIP, then used ChatGPT to reverse-engineer Spark’s app architecture and rebuild the project with more detailed instructions.

This experience pretty much sums up today’s vibe-coding scene: vibe-hackers are out of the box thinkers who juggle multiple tools; when one doesn’t do what you need, you pick up another.

My final lesson: vibe-coding is a lightning-fast way to prototype and experiment but it still takes time to create a production-grade app ready to be shared with others. That’s why for now, I’m only sharing screenshots.

Just like with my “Slide Tools” hackathon two weeks ago, I was reminded of the real promise of AI-driven coding:

The future of software with AI: everyone can be a creator.

The next generation of apps — whether e-readers or enterprise apps — will be powered by AI, built faster than ever, and customizable to fit customer needs with precision.

And some of those apps will be built by marketers.

Marketers as vibe-coders

“Vibe Marketers” are already starting to appear on job boards:

Article content
Article content

“We’re looking for a Vibe Growth Marketing Manager who is a builder who prototypes and ships faster than most teams can spec a brief. You’ll use AI tools, LLMs, no-code/low-code platforms, and smart automation to rapidly unlock new growth channels, improve operational efficiency, and experiment with new marketing ideas end-to-end.”

It’s clear that vibe-coding is becoming essential for speed and efficiency in marketing workflows.

But why stop at workflows? What if marketers could also be the first prototypers of new product ideas?

Marketers as product prototypers

Marketers are already customer advocates and trend spotters. Vibe-coding tools now give them the ability to turn insights directly into working prototypes, bridging the gap between customer voice and product innovation.

With vibe-coding, marketers can also extend existing products with new features requested by their customers, as I demonstrated in my “Slide Tools” hackathon.

Article content
My custom slide tools I added to Google Slides

A sneak peek into my book’s vision

Elevating marketers into co-creators of product is central to my book’s vision. My goal is to restore marketing to Kotler’s full “4 Ps” (product, price, place, promotion), rather than the narrow “1 P” of promotion it’s often reduced to. Vibe-coding tools may be the superpower that helps marketers reclaim all four.

If you’re a startup founder or marketing leader, my upcoming book Marketing Plan for Tech Startups project distills lessons from Fortune 500 companies and startups into practical frameworks to break through the noise and turn innovation into revenue.

I’m also thrilled to share that the one and only Priyanka Vergadia is among its distinguished contributors! 😀

Reserve your copy here:

https://marketingwithjustyna.gumroad.com/l/MARKETINGPLAN

PS: If you want to try Spark yourself, watch Priyanka’s excellent demo:

How I Vibe-Coded My Way Out of 200+ Slide Edits Into the Future of Software

29 Aug

This weekend, I pulled off my own hackathon. The challenge? Cleaning up 200+ Google Slides of my upcoming book: Marketing Plan for Tech Startups.

Why so many edits?

After a year of experiments and contributions from several collaborators, each with their own style, the deck had turned into a Frankenstein: fonts all over the place, inconsistent sizes, text boxes scattered. Original thinkers are not known to stick to templates. 🤪

Why did I write a book in Google Slides?

Because I wanted to create a tool as much as a book, a resource startup founders and marketers can apply right away. My rationale: keep text lean, rely on visuals, and use slides as a constraint so every word carries weight.

As the book launch at TECH WEEK by a16z in San Francisco this October approaches, the thought of unifying it all was daunting. Manually cleaning 200+ slides would take days, and still never be perfectly consistent.

So I turned to AI. It thrives on repetitive and grueling work, the kind humans struggle to do well. I just needed to get it inside Google Slides.

How to vibe-code away the pain of manual slide edits

First, I accessed App Script under “Extensions” in Google Workspace Slides:

Article content
Accessing Google Slides API via Apps Script

Second, I used Windsurf to vibe-code the features I wanted:

From a single prompt…

Article content

… I got ready to use code and a deployment guide in seconds.

Article content

Third, I pasted the code into Apps Script…

Article content
Apps Script in Google Slides

… and just like that, I got the first tool. Quick test… It works, yay!

I continued with more prompts to build functions like updating colors to a specific shade of black or changing fonts to Lato.

Soon enough, I had my own full set of “Slide Tools” to tame 200+ slides. ⬇️

Article content
My custom set of “Slide Tools”

Maybe in addition to publishing a book, I should start a side hustle selling Google Slides automations. After all, I have already got one very polished deck to prove it works. 😉

One more thing

Like every good hackathon, this one came with a “one more thing.” It reminded me of the real power of vibe coding: when products open APIs, anyone can go beyond the defaults, shape tools their own way, and turn a generic product into something personal.

The future of software with AI: everyone can be a co-creator.

And with vibe coding democratizing access to computer programming, that future is close and attainable.

Everyone can make a popular tool even more useful.

As a marketer, I’m excited about the future of software. I’ve spent my career helping emerging technologies find their market and convert innovation into sales. That same spirit is what I poured into my upcoming book. Marketing Plan for Tech Startups is meant to be a practical guide that helps founders and innovators do the same.

And just like a product with open APIs, this book is built to be extended. If you’d like to add your perspective or contribute to future editions, I’d love to hear from you.

Please comment below or send me a DM, and I’ll be in touch!

If you’d like to pre-order my book and/or support the launch, here’s the link: https://marketingwithjustyna.gumroad.com/l/MARKETINGPLAN

Article content

Blending Science and Art: The Multimodal Craft of an Exceptional Gen AI Paper

5 Apr
With the entire text of Les Misérables in the prompt (1382 pages), Gemini 1.5 Pro locates a famous scene from a hand-drawn sketch

Technical writing is one of my favorite reads. It’s clear, succinct, and informative. DeepMind’s technical paper on Gemini 1.5 epitomizes all I love about technical writing. Read the abstract for a glimpse into the groundbreaking advancements encapsulated in Gemini 1.5 Pro; it’s a masterclass in effective communications. We learn how to deliver maximum insight with minimum word count.

In just 177 words, my DeepMind colleagues articulate:

  • #ProductCapabilities: “a highly compute-efficient multimodal* mixture-of-experts model** capable of recalling and reasoning*** over fine-grained information from millions of tokens of context”
  • #UniqueSellingPoint: “near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 2.1 (200k) and GPT-4 Turbo (128k)”
  • #UseCases: “surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person learning from the same content”
Gemini 1.5 Pro is able to translate from English to Kalamang with similar quality to a human

The science of writing succinctly

In a few words, the paper abstract communicates the model’s superior performance, its leap over existing benchmarks, and its novel capabilities. It sparks curiosity about the future potentials of large language models—a true testament of powerful, precise, impactful technical communication.

How did the Gemini 1.5 paper authors achieve this mastery? By following the guiding principles of Brevity (saying more with fewer words) that my friend and thought partner D G McCullough and I recently summarized as: “Trust, Commit, Distill”:

  • #Trust means believing in the power of your message without over-explaining nor adding unnecessary details. Trust empowers the communicator to eliminate redundancy, focusing on what’s truly important. The Gemini 1.5 paper authors trust their curious readers to look up terms that may be new to them. On first read, I had to look up “mixture-of-experts” but the context I’ve had from my 2 years of working with data and AI allowed me to “guesstimate” its meaning before getting the proper definition.
  • #Commit refers to sticking with the essentials of your message, understanding your message’s objective, and resisting tangents or unnecessary explanations diluting the message’s impact. (Which requires discipline!)
  • #Distill requires breaking down your message to full potency. Like distilling a liquid to increase its purity, we must strip away the non-essential until the most impactful, clear, and concise message remains. Every word and idea then serves a purpose–and voila! Your message becomes clearer, and more memorable.

The art of replacing 100s of words with a single image

The saying “A picture is worth a thousand words” truly shines in technical communication. A single, well-chosen image can articulate complex ideas with more efficiency and impact than verbose descriptions. The Gemini 1.5 paper’s authors skillfully weave in visual elements, showcasing a deep grasp of conciseness. This approach not only makes complex AI and machine learning concepts approachable and captivating but also boosts understanding and enhances the reader’s journey. It demonstrates that when it comes to sharing the latest scientific breakthroughs, visual simplicity can convey a wealth of information.

With the entire text of Les Misérables in the prompt (1382 pages), Gemini 1.5 Pro locates a famous scene from a hand-drawn sketch

Simplify complexity with brevity

In our rapid world, where attention is a rare commodity and people often skim rather than read, the skill of conveying ideas briefly and through visual storytelling stands out as a significant edge. Simplifying complex concepts into engaging visuals and concise explanations can mean the difference between being noticed or ignored.

Richard Feynman, the celebrated physicist, Nobel laureate, and cherished educator, famously stated, “If you can’t explain it simply, you don’t understand it well enough.”

Richard Feynman quotes

Feynman’s approach isn’t just about words; it involves using visuals and images to make intricate ideas more approachable. After all, the deepest insights are usually the easiest to understand when we apply brevity to break down complexity.

DeepMind’s Gemini 1.5 technical paper exemplifies this principle perfectly. It’s essential reading for anyone intrigued by general AI (especially with #GoogleCloud #NEXT24 on the horizon), and it’s an exemplary model for those dedicated to honing their communication skills.

#TechnicalWriting #Innovation #ArtificialIntelligence #LanguageModels #Brevity #BrevityRules #GoogleCloud #NEXT24 #DeepMind

Read the full abstract

“In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. Gemini 1.5 Pro achieves near-perfect recall on long-context retrieval tasks across modalities, improves the state-of-the-art in long-document QA, long-video QA and long-context ASR, and matches or surpasses Gemini 1.0 Ultra’s state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5 Pro’s long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 2.1 (200k) and GPT-4 Turbo (128k). Finally, we highlight surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person learning from the same content.” https://storage.googleapis.com/deepmindmedia/gemini/gemini_v1_5_report.pdf

Define the key terms used in the abstract

* #Multimodality: Gemini is natively multimodal.  Prior to Gemini, AI models were first trained on a single modality, such as text, or image, and then corresponding embeddings were concatenated. For example, the embedding of an image would be generated by an AI model trained on images, the embedding of the text describing the image would be generated by an AI model trained on texts, and then the two embeddings would be concatenated to represent the image and its transcript. Instead, the Gemini family of models was trained on content that is inherently multimodal such as text, images, videos, code, and audio. Imagine being able to ask a question about a picture, or generate a poem inspired by a song – that’s the power of Gemini.

** #Mixture-of-Experts Model: At the core of Gemini’s groundbreaking capabilities lies its innovative mixture-of-experts model architecture. Unlike traditional neural networks that route all inputs through a uniform set of parameters, the mixture-of-experts model consists of numerous specialized sub-networks, each adept at handling different types of information or tasks—these are the “experts.” Upon receiving an input, a gating mechanism intelligently directs the input to the most relevant experts. This selective routing allows the model to leverage specific expertise for different aspects of the input, akin to consulting specialized departments within a larger organization for their unique insights. For Gemini, this means an unparalleled ability to process and integrate a vast array of multimodal data—whether it’s textual, visual, auditory, or code-based—by dynamically engaging the most suitable experts for each modality. The result is a model that not only excels in its depth and breadth of understanding but also in computational efficiency, as it can focus its processing power where it matters most, without overburdening the system with irrelevant data processing. This approach revolutionizes how AI models handle complex, multimodal inputs, enabling more nuanced interpretations and creative outputs than ever before.

A Mixture of Experts (MoE) layer embedded within a recurrent language model https://openreview.net/pdf?id=B1ckMDqlg

*** #Reasoning: Gemini goes beyond simple pattern recognition. It utilizes a novel architecture called “uncertainty-routed chain-of-thought” to reason and understand complex relationships within and across modalities. This enables it to answer open-ended questions, solve problems, and generate creative outputs that are not just factually accurate but also logically coherent.