Sharing Our Gen AI Learnings: A Starting Point
Gen AI Art - A Steampunk City1
Advancements in AI continue to accelerate rapidly, allowing new paradigms for creative and business-building efforts. We are committed to using this newsletter to journal our determination to push the boundaries of what we can achieve at the intersection of both.
An important premise is that where some see only risk in these emerging technologies, we see empowerment. We see this everywhere with AI, where the limitations of the technologies somehow end up overshadowing the need to roll up our sleeves and leverage this breakthrough to achieve things previously thought impossible.
We believe that with thoughtful design and application, AI can democratize innovation much like the internet did for information. And, unlike the internet, which relied mostly on coding skills to succeed, we see in AI’s conversational nature the ultimate dream for creatives and especially writers: the ability to conjure new worlds of possibilities from words.
Wherever you are on your AI journey, we would argue that it helps to think about your interactions with AI as those of a Spell-Crafter, bending the forces of the universe to your will with a simple flick of the fingers and the correct sequence of word incantations. Spells are a reality, folks. For lifelong nerds like us, this is exciting beyond our wildest dreams even compared to just two years ago.
So join us as we detail our latest experiments using AI for content, visuals, ideation and more. The future remains unwritten, and we want it to be written by those that have been left out, ignored or excluded from the current systems of power and control. But first, we have to master the tools.
Pushing the Boundaries of AI for Content and Visuals
As we continue exploring various AI tools for different applications, Praveen has been busy testing the boundaries of what’s possible for content and visual creation. His adventures generating images with DALL-E in ChatGPT yielded some surprisingly impressive results.
Dall-E
The tool has massively improved with the release of version 3, now integrated directly into the ChatGPT interface. Unlike MidJourney (which we will discuss in a moment), DALL-E takes a lot of the guesswork out of writing prompts, as the system itself takes the user’s written instructions and improves them before generating any images.
The basic recipe to keep in mind when asking for an image is: type of image (illustration, painting, photo) + subject (what the image is about, put the important details first) + additional instructions (e.g. aspect ratio).
Note that even basic prompts can yield great-looking results, though, as a general rule, we recommend doing a bit of brainstorming before providing the actual prompt. Minor details from our initial idea can really help the AI meet our expectations.
Now for the current key issues we see with DALL-E.
First, we recommend not to ask DALL-E to include text in your images! That, unfortunately, continues to produce gibberish results.
Another limitation is that, from recent updates, the tool does not produce more than one image at the time, although there are some good “GPTs” (customized ChatGPT interfaces, which we will cover in upcoming newsletters) that overcome this, like this one.
Third, in our experience, the default DALL-E style is often repetitive, leaning heavily toward a simple illustration style that does not lend itself well to more diverse creative needs.
Mid-Journey
If specific image types are important, that is where Midjourney comes in, although it does come with a separate subscription charge. The two major features of MidJourney we like are:
it has become less reliant on complicated prompts and able to produce absolutely incredible art styles.
The tool has some fantastic and easy-to-use commands like the ability to one-click produce variations on an image that is closest to our intended output (just click V1, V2 or other relevant button right below the output).
Other incredibly powerful tools are the “Variation” button which allows you to (a) pick a part of the produced image you wish to alter and (b) provide a text prompt for how you would like to change just that section.
We have also been super impressed with the ability to “Describe” uploaded images, which lets the system write four different prompts based on an initial image we like as inspiration. That means you can go from an inspiration to a set of four original images in the AI tool in about a minute!
Drawbacks:
Still Discord-only: although a web version is coming soon, the tool currently can only be used through the Discord app, which not everyone likes (Paolo has been VERY vocal in his dislike of the messy Discord interface!)
More critical thinking is required upfront about what image we want: the type of illustration (oil painting, 3D, photography, etc), the aspect ratio (typically something like 21:9 or multiples thereof) and inspiration styles are pretty much non-negotiable parts of a good MidJourney prompt, which means more work on the user’s part! However, once the results are out, we have been consistently impressed with them.
We will happily get back to discussing these tools, so let us know if you want more deep dives on this and other tools out there like Adobe Firefly.
Wordsmithing
On the content side, we’ve continued comparing OpenAI’s ChatGPT (GPT4 Turbo specifically), Anthropic’s Claude and other tools on long-form document handling. Of the two main tools, Claude has the largest context window at 200,000 tokens (which translates to around 75,000 word, basically the typical length of a novel).
Yet, despite the theoretical capabilities, as Paolo found when editing an entire book manuscript of his, it is still necessary to proceed step by step. Processing individual chapters one at a time seems to work best for preserving context rather than trying to get the AI to “understand” an entire book at once.
A note of warning is that the latest update to Claude, version 2.1, has some quirks compared to the previous version, for example, the requirement that before asking the AI to do any type of planning or writing, all context information (e.g. your existing draft or research) must be loaded first. So the prompt sequence is now something like:
“Claude, here is a draft/my research/list of data points, etc.”
“Claude, use the information above to do ABC.”
That being said, and despite how much we like using Claude, for anything more research-related, where you are not just working on an existing text, but need to find and add information to plan your document, ChatGPT remains the far better tool given its web searching capabilities. Unfortunately, the service lately has been absolutely terrible for the past month, with constant bugs, crashes, and service interruptions. Hopefully, things will improve after the holidays.
That aside, the ChatGPT still reigns as the ideal research assistant. It also remains a more structured “thinker” compared to Claude, which is a little more conversational and sparse in its outputs.
Overall, ChatGPT, which now combines web search, DALL-E, and coding, all in a single interface, is the chatbot of choice.
Pro-Tips:
Yes, each chatbot has its strength and weaknesses, so why not use both?!? Use ChatGPT for initial research and planning, then hop over to Claude for more natural-sounding writing.
If you use Chrome, Paolo highly recommends installing the free plugin “Harpa AI”. It puts ALL the main chatbots in a convenient pop up that you can call on any web page you are browsing, whether it is an article, a YouTube video, your web mail, or more. The features of the tool are really powerful and they allow you to research and process information much more efficiently without going back and forward with the chatbot interfaces which, honestly, would have been outdated in the late ‘90s.
We hope you enjoyed the learnings above, and we will share a separate article with some specific prompts across key areas of writing, editing, marketing, that we think will be useful. So stay tuned for that!
Full MidJourney prompt: /imagine prompt: The mesmerizing detailed illustration of a bustling steampunk cityscape filled with intricate mechanical structures, airships soaring above, and steam-powered vehicles on the cobbled streets. Include a touch of Victorian-era fashion for the city's inhabitants., Photography, DSLR camera with a focus on composition and muted tones, cinematic expedition, natural wonder, muted colors, tranquility and nostalgia, Ektar magazine aesthetic. --s 250 --style raw --ar 16:9