AI Personas Beyond Pirate Speak - Your Shortcut to Customer Understanding

It's not uncommon for people to experiment with making ChatGPT talk like a pirate or adopt other quirky personas. This playful interaction often serves as our introduction to the concept of shaping AI responses through role-playing prompts which is an extension of the idea of providing specific details in the context/prompts we use. As we dig deeper into the capabilities of large language models (LLMs), many users discover that specifying more thoughtful and relevant roles can lead to more valuable outputs. This has given rise to the now-familiar practice of beginning prompts with phrases like "You are a world-class analyst..." or similar role-specific instructions.

This approach to prompt engineering isn't just for fun—it has practical applications in various business contexts. By carefully defining the AI's persona, we can tailor its responses to better suit our needs. The potential of role-based prompting is being explored across various domains, including gaming, content generation, AI agents, personalization, marketing, and numerous other applications. One particularly intriguing use case is the ability to gather diverse perspectives on a given topic.

Imagine you're curious about how your clients might respond to a new product launch or a policy change. By developing specific "personas" that represent different segments of your target audience, you can leverage LLMs to simulate a range of viewpoints. Do this by iteratively and repeatedly presenting each persona along with your question to the LLM and then and analyzing the responses.

While this process doesn't replace genuine market research or direct customer feedback, it can be an invaluable tool for initial exploration and brainstorming. By simulating conversations with these AI-generated personas this approach offers a cost-effective and time-efficient method to quickly gain insights into potential reactions, concerns, or enthusiasm from various stakeholder groups, helping you refine your ideas or strategies before investing in more extensive research or implementation efforts.

Analyzing the Responses

LLMs can take this process a step further by analyzing the synthetic responses they generate, identifying overarching topics and themes transforming the process from a simple response generator into a powerful insights engine. By leveraging this feature, you can:

Prepare more effectively for customer research, having already explored potential areas of interest or concern.
Test assumptions about their target audience, potentially uncovering blind spots in their understanding.
Refine messaging strategies based on the simulated feedback from various personas.
Uncover unexpected insights that might warrant further investigation or consideration.

This approach doesn't replace traditional market research, but it can serve as a valuable preliminary step. It allows companies to enter actual customer conversations or surveys with a more nuanced understanding of potential viewpoints, ultimately leading to more productive and insightful interactions.

Generating Personas

LLMs can also be leveraged to generate personas based on existing customer data, including reviews, feedback, and interaction records. While these AI-generated personas may initially lack the depth of custom-crafted ones, they serve as a solid foundation that can be further developed through human input or additional LLM enrichment interactions.

The paper Scaling Synthetic Data Creation with 1,000,000,000 Personas explores two approaches to generating personas using LLMs:

The Text-to-Persona approach - uses any text as input to create personas prompting the LLM with the equivalent of "Who is likely to [read|write|like|dislike...] the text?"
The Persona-to-Persona approach - creates addition personas, that may not be represented in any text, by imagining personas related to the previously generated personas by prompting with the equivalent of "Who is in close relationship with the given persona?"

These approaches offer scalable ways to create a wide array of personas, potentially uncovering audience segments or characteristics that might have been overlooked in the traditional persona development processes.

Analyzing 200,000 personas

The authors of the Scaling Synthetic Data Creation with 1,000,000,000 Personas paper released a set of 200,000 of the generated personas and I wanted to get a better feeling for what they were like. Looking at the file, they include personas such as:

A Political Analyst specialized in El Salvador's political landscape.
A legal advisor who understands the legal implications of incomplete or inaccurate project documentation
A maternal health advocate focused on raising awareness about postpartum complications.
etc.

But it is not feasible to review and analyze all 200k manually so I created a process that vectorizes each persona, clusters them into 10 groups, names the groups using an LLM, and samples and lays out the high dimensional (1024) vector embedding in a two dimensional space using umap for visualization.

To do this I used "BAAI/bge-large-en" for embeddings, qwen2 for naming and note 10 is an arbitrary number of clusters. Also, the clusters and names are not 'standard' in anyway. They're meant to be illustrative of what is the dataset, not to conform to a pre-established taxonomy. Each time the program is run the clusters and names will vary.

2d plot of all personas

Given all that, we can see from the first image that the top level clusters include groups named:

Creative Community
Diverse Business Leaders and Entrepreneurs
Community Leaders & Activists
Enthusiastic Historians and Scholars
Advanced Software and Research Professionals
Interdisciplinary Scientists and Researchers
Supportive Elders and Caregivers Group
Education Motivators and Aspirants
Sports Elite and Enthusiasts
Strong Skeptics and Independent Thinkers

2d plot of all business leaders group

And then digging deeper and repeating the process for the "Diverse Business Leaders and Entrepreneurs" group we get:

Organized Leaders and Managers
Marketing & Brand Strategy Team
Manufacturing & Engineering Experts
Local Business Owners Network
Entrepreneurial Innovators and Strategists
Legal & Compliance Experts
Expert Investment Guidance Team
Sustainable & Innovative Design Entrepreneurs
Travel & Events Elite
Real Estate Professionals and Investors

2d plot of all marketing and brand strategy group

And finally for this article, digging into "Marketing & Brand Strategy Team" group we get:

Expert Market Strategists and Consultants
Promotion and Branding Specialists
Social Media Influencers and Partners
Dynamic Sales Team
Strategic Marketing Powerhouse Group
Media Power Players and Professionals
Communications & Public Relations Experts
Publishing Powerhouse Group
Film Industry Power Players
Global Music Powerhouses

And the 3 most typical personas for the "Expert Market Strategists and Consultants" group are:

A marketing strategist seeking guidance on understanding consumer decision-making processes
A marketing consultant who can provide insights on incorporating technology into different marketing channels
A marketing specialist who shares insights and strategies for effective competition

Remember this is just one grouping and naming for this particular dataset. The categories will not correspond directly to the categories you may use and they will change every time the program is run ... but they are incredibly useful for getting insights into the types of personas in this dataset.

TL;DR

Given real reviews, feedback, and interactions you can generate synthetic personas.

Given hand crafted or synthetic personas they can be put into groups and named to better understand your audience.

Given hand crafted or synthetic personas you can generate synthetic perspectives for a new product, business or idea. For example if you are creating a new product for marketing professionals you can analyze your product/features/messaging from the point of view of a dynamic sales team or film industry power players, etc.

LLMs are not a substitute for talking to real people but can give you a great insights and a head start on the process.

Keep in mind that these are just simulations of real people so group membership may not dictate holding a specific point of view. But by priming the LLM in this way you can create perspectives with useful variety.

Let me know if you have any questions or want to discuss this further.

Want to get notified of new articles and insights?