The Personal AI Greenfield

What forms of pAI—personal AI—are Apple, Mozilla, Google, Meta, Microsoft and the rest not doing?

Let’s look at those first two because they’re at the top of the news LIFO buffer.

Apple Intelligence (“coming in beta this fall*“), announced yesterday, will help you with writing and creating images while giving you less lame answers from Siri. (Which they should re-name. Siri is Apple’s Clippy.) It “can draw on larger server-based models, running on Apple silicon, to handle more complex requests for you while protecting your privacy.” The “larger models” will be white-labeled ChatGPT, plus Apple’s own small language models (SLMs).

Mozilla, which got $400+ million a year from Google (for search in the Firefox browser) starting in 2020, announce on June 3 that they will be Building open, private AI with the Mozilla Builders Accelerator. Jive:

This program is designed to empower independent AI and machine learning engineers with the resources and support they need to thrive. It aims to cultivate a more innovative AI ecosystem, and it’s one of Mozilla’s key initiatives to make AI meaningfully impactful — alongside efforts like Mozilla.ai, the Responsible AI Challenge and the Rise25 Awards.

The Mozilla Builders Accelerator’s inaugural theme is local AI, which involves running AI models and applications directly on personal devices like laptops, smartphones, or edge devices rather than depending on cloud-based services…

We chose Local AI as the theme for the Accelerator’s first cohort because it aligns with our core values of privacy, user empowerment, and open source innovation. This method offers several benefits including:

  • Privacy: Data stays on the local device, minimizing exposure to potential breaches and misuse.
  • Agency: Users have greater control over their AI tools and data.
  • Cost-effectiveness: Reduces reliance on expensive cloud infrastructure, lowering costs for developers and users.
  • Reliability: Local processing ensures continuous operation even without internet connectivity.

Looks to me like both of these are Big AI writ small. It’s “local,” not personal. It’s made to serve your needs with what BigAI offers through APIs. It is still essentially AIaaS (AI as a Service), rather than truly personal AI (pAI): personalized more than personal.

That’s also what I see when I read between the lines at Mozilla’s AI job openings. Take platform engineer. This person will (among other things), “assist in managing and orchestrating workloads across multiple cloud providers.” That’s fine. I’m sure true pAIs will do that too. But most of pAI will be more personal than that. It will deal with the mundanities of your everyday life. Not with coughing up answers that can only come from AIaaSes.

The problem with personalizing AI giant offerings is that they are large language models (LLM) trained on everything that can be crawled on the Internet, plus who knows what else. Not on your truly personal stuff. This is why “prompt engineering” worthy of the noun is ” not for anybody:

Prompt engineering is crucial for deploying LLMs but is poorly understood mathematically. We formalize LLM systems as a class of discrete stochastic dynamical systems to explore prompt engineering through the lens of control theory. We investigate the reachable set of output token sequences $R_y(\mathbf x_0)$ for which there exists a control input sequence $\mathbf u$ for each $\mathbf y \in R_y(\mathbf x_0)$ that steers the LLM to output $\mathbf y$ from initial state sequence $\mathbf x_0$. We offer analytic analysis on the limitations on the controllability of self-attention in terms of reachable set, where we prove an upper bound on the reachable set of outputs $R_y(\mathbf x_0)$ as a function of the singular values of the parameter matrices. We present complementary empirical analysis on the controllability of a panel of LLMs, including Falcon-7b, Llama-7b, and Falcon-40b. Our results demonstrate a lower bound on the reachable set of outputs $R_y(\mathbf x_0)$ w.r.t. initial state sequences $\mathbf x_0$ sampled from the Wikitext dataset. We find that the correct next Wikitext token following sequence $\mathbf x_0$ is reachable over 97% of the time with prompts of $k\leq 10$ tokens. We also establish that the top 75 most likely next tokens, as estimated by the LLM itself, are reachable at least 85% of the time with prompts of $k\leq 10$ tokens. Intriguingly, short prompt sequences can dramatically alter the likelihood of specific outputs, even making the least likely tokens become the most likely ones. This control-centric analysis of LLMs demonstrates the significant and poorly understood role of input sequences in steering output probabilities, offering a foundational perspective for enhancing language model system capabilities.

But all that stuff applies mostly when we’re prompting a big LLM system.

What about using AI in our own lives, where the data that matters most are in our calendars, contacts, financial and health records, our travels, our correspondence (email, chat, whatever)? And how about all the location data we might get from our cars, phone apps, and phone companies? These should be much easier for a pAI to gather, examine, and help us do useful things. Caring about much less data also means a pAI will be less likely to give wrong (hallucinated) answers.

Today the mental frame almost everybody uses for AI is the Big kind, ingesting everything they can get their crawlers on, and munching all of it in giant compute farms. Those systems are great for lots of stuff, but they still don’t deal with personal data listed in the last paragraph.

Not yet, anyway.

Look at it this way. For each of us, there are three data pools:

  1. The entire Net, which is what gets crawled by all the giant LLM operators, plus whatever else they can get their claws on.
  2. One’s personal life, some of which is digitized in useful form (contacts, calendar, mail, stuff in folders inside PCs and attached drives).
  3. Personal data that is in the hands of giants, but is rightfully ours. These include our driving record and driving practices (,recorded by our late model cars and snitched to insurance companies and others), our location data (kept and shared by car and phone carriers to the likes of Google and the feds), our TV viewing habits, (gathered by Google, Amazon, Roku, Apple, etc.).

The pAI greenfield is with the last two.

Tell us who is working on what there, preferably with open source, and not sitting on walled garden silicon.

[Later… ] Since readers told me I had small language models (SLMs) wrong in one of the paragraphs above, and I’m not sure I had them right, I rewrote them out of the piece. I invite readers to post comments to further correct and expand on the subject of pAIs and what they can do.

8 Comments

  1. Mei Lin Fung

    There is a layer missing for me – taking a People Centered approach – the need for Community governed data and using that data to train AI for Community good.

    We at People Centered Internet recently wrote about this for the Italian G7 -Think7.org
    “Improving Global Governance: Data Cooperatives for Global Cooperation”

    which builds on 3 years of engagement with Think7 for proposing Community data cooperatives/digital utilities

    Italy 2024
    https://think7.org/improving-global-governance-data-cooperatives-for-global-cooperation/

    Japan 2023
    https://think7.org/people-centered-science-and-digital-transformation-a-practical-proposal-for-the-g7-and-g20/

    Germany 2022
    https://think7.org/global-public-private-digital-utilities-for-msme-recovery-and-transition/

    https://think7.org/community-climate-clubs-to-motivate-and-create-personal-action-for-an-equitable-world/

    Personal AI can be – should be – complemented by Collective Intelligence – so we can walk together on the journey towards wisdom that might be valuable for future generations

  2. Adrian Gropper

    Yes, the pAI greenfield is with _data_ in the last two, but you’re conflating data with knowledge and setting up a false dichotomy between corporate LLMs and pAI. Corporate LLMs are knowledge in the same way that Wikipedia is and if we try to add LLM multi-modal capabilities like image understanding, translation, and emotion detection to Wikipedia’s knowledge it might get to be just as large and expensive as today’s LLMs.

    My local, personal, or personalized AI, trained on my private data wherever it is stored, will _always_ benefit from access to a generic LLM if the cost and privacy risk is acceptable in any specific situation. So, before we cast shade on Apple Intelligence, maybe we can at least agree on the need for generic, multi-modal LLMs.

    The hosting of pAI is another issue where I think Apple is doing much better than the other projects. The pAI that runs on my phone will never be as good and cheap as the one that runs on my MacBook Pro, if only because of battery life. So, my pAI on my phone will need access to my pAI on my MBP _or_ to an equivalently private service in the cloud like the one Apple introduced as part of Apple Intelligence and Siri.

    That said, Apple is not doing a perfect job because their walled garden lock-in business model adds cost and limits compatibility. That will need to be solved by standards like an IETF personal digital agent standard pdap@ietf.org or something like it.

    • Doc Searls

      Good points.

      We need AND logic for personal + corporate AI.

      Writing this close to a year after the post went up, Apple seems no closer to understanding, embracing, and equipping us with personal AI. It’s sad, but not surprising.

  3. Dean Landsman

    From the upcoming PDEC Newsletter:

    How About Microsoft?! At its Build developer conference earlier this month it unveiled a new line of PCs. Super fast. More productive (yeah, I know. Microsoft always says that) and, get this: a powerful data collection and search tool that screenshots a device’s activity – including password entry – every few seconds.

    It is called, appropriately, “Recall.” It is, Microsoft says, Copilot+PCs. Recall is your copilot? Maybe its Microsoft’s copilot. It’s not the copilot of that famous one named in a movie title.

    Microsoft boasts of its neural sort of feature, allowing the user to randomly pick some element of a site or page, such as, “remember that great PDEC newsletter’s link to Microsoft’s invasive search thing?” and it will find it for you. Provided, of course, you clicked the link.

    Here is how Microsoft equivocates/describes the actual virtuous, user-side control. Or at least what the user should be frightfully/highly aware of when using it: “Note that Recall does not perform content moderation. It will not hide information such as passwords or financial account numbers. That data may be in snapshots that are stored on your device, especially when sites do not follow standard internet protocols like cloaking password entry.”

    This information comes courtesy of the Malwarebytes Labs May 22nd Newsletter. The General Manager of its Consumer Business Unit, Mark Beare says, in that newsletter item, “I worry that we are heading to a social media 2.0 like world. With AI there will be a strong pull to put your full self into a model (so it knows you),” 

    Beare goes on, “I don’t think it’s easy to understand all the negative aspects of what can happen from doing that and how bad actors can benefit.”

    PDEC Official response: Ya think?

    It took almost no time at all for a hacker tool to emerge that extracts all the data collected by Recall. https://bit.ly/PDEC-Link4U

    That’s from Wired. You get some free articles from Wired before you have to subscribe in order to read them.
    __________________________________________________________________________________________________

    If you want to receive the complete PDEC June 2024 Newsletter, send an email to dean@pde.cc

  4. Adrian Gropper

    Microsoft as a platform was left in the dust when Apple and then Google established their mobile and app store platforms. As table stakes for generative large language models hovers in the very exclusive $Trillion+ club, the next generation of AI-enabled platform will be driven by privacy and anti-trust realities. Microsoft and Google are no match for Apple’s privacy brand but they could compete by strategic use of standards and business alignments.

  5. Soren Larson

    We’re working on pAI for item 3 (and perhaps later 2) at Crosshatch
    https://crosshatch.io

    We’re documenting everything openly, and are full believers in data portability e.g., Mozilla manifesto
    > We believe we should be able to shape the internet with our data in a way we control.

    We’re also followers of e.g., Deleuze both for his

    “Revolutionaries often forget, or do not like to recognize, that one wants and makes revolutions out of desire, not duty.”

    and also following a post-structuralist view of identity as emergent, fluid, contingent.

  6. Aamir Khan

    The Personal AI Greenfield is an exciting frontier, brimming with potential to revolutionize our daily lives. The idea of personalized AI that genuinely understands and caters to individual needs is fascinating. Imagine an AI that grows with you, learning from your habits and preferences to become a true extension of yourself. It’s not just about automation; it’s about creating meaningful interactions that can enhance our productivity, well-being, and even creativity. However, as we venture into this new territory, we must tread carefully, balancing innovation with ethical considerations to ensure these advancements benefit everyone, not just a select few. The possibilities are endless, and it’s up to us to shape this greenfield into something truly transformative.

  7. Roy Max

    Excellent Post

Leave a Reply to Aamir Khan Cancel reply

Your email address will not be published. Required fields are marked *

© 2026 ProjectVRM

Theme by Anders NorenUp ↑