Landscape

Types of AI You'll Run Into

A quick opinionated tour of the AI tools a CO will actually bump into. What each one is good at, what each one isn't, and which ones I reach for.

There are a lot of AI tools out there. The ones below are the ones a working CO is most likely to actually run into, whether through personal use or through what the government puts in front of them. Each has a place. They also change fast, so if a pricing tier or capability I mention looks off by the time you're reading this, that is the game.

Play with these at home first. Spend real time with the consumer versions off the government network before you worry about what the approved version can do. You'll learn the actual capability faster that way, and you'll stop expecting miracles from whatever shows up on GenAI.mil.

Use a desktop or laptop if you can. The phone apps are fine and you can do some neat things with them, but the real capability lives on a computer. Claude's Cowork doesn't exist on mobile. Coding with ChatGPT on a phone is a slog. I keep the phone apps around for brainstorming and sanity checks and do the actual work at a keyboard.

Claude by Anthropic

Best for Long-form writing, code, building tools
Cost Priciest of the big four
My take Daily driver

Claude is what I use every day and what built most of this site. The big differentiator is the desktop app's Cowork feature, which lets Claude read and write directly to folders on your computer. If you're building anything beyond a one-off document, like a tool, a web page, a bundle of files that need to work together, Cowork is a game changer.

Claude also shows up inside Microsoft Office through add-ons that plug the model straight into Word, PowerPoint, and Excel. You can throw together a polished PowerPoint in an hour that would normally take a week to perfect. Two ways to play it: hand those Office integrations a rough outline and let them build the document inside PowerPoint or Word themselves, or build it in the Claude desktop app with Cowork and hand Office a finished file. Both paths work, and once you've used either one a few times, going back to building slides by hand feels silly.

It's also better than the other three at long-form narrative writing. It has fewer guardrails on content than GPT or Gemini, which matters when you're writing realistic training scenarios and don't want the model clutching pearls. And it's excellent at code.

The catch: Claude is the most expensive option, and you reach your usage limits quickly. The cheapest plan will not get you far if you're actually trying to build anything. You get what you pay for, and if you have one AI budget line, I'd put it here.

ChatGPT by OpenAI

Best for Research, image generation, messing around
Cost Cheap plan goes a long way
My take Second best overall, best at a few things

ChatGPT is probably the second-best tool in the pack, and it's my pick if you just want to mess around with AI to learn what it can do. The cheap plan has generous limits, so it is hard to run up a bill experimenting. It also codes well. If you have some technical comfort (like knowing how to get your code uploaded to a host), you can build real things with it even without a Cowork-style file feature.

Two places it still wins outright. Deep Research is top-notch if you're researching a topic and need sources pulled together into something readable. And image generation is the best of the bunch, nobody is close, and it is only getting better.

Where it has fallen off, for me, is general output quality. It's gotten noticeably sloppier over the last several months on the writing and reasoning side. ChatGPT used to be my main driver. Now I reach for Claude for most of the heavy drafting and save ChatGPT for research and images.

Grok by xAI

Best for Brainstorming, no-BS feedback
Cost Feels pricey for what you get
My take Niche, but a useful niche

Grok prides itself on down-to-earth language and not sounding like a corporate robot. That shows up most when you're brainstorming or trying to stress-test a take that the other models want to hedge. I use it to bounce more controversial opinions off of, because it will give me a direct answer instead of three paragraphs of caveats.

I haven't put it through its paces on coding or document work, partly because the plans feel steep for me given I don't think it's doing anything Claude or ChatGPT can't already do better. If you're using Grok as an actual work tool, I'd genuinely like to hear how that goes. I'm open to having my mind changed on this one.

What's actually in the box. Since I'm not a daily user, here's what xAI and reviewers are reporting so you know what you'd be getting. Real-time access to X (formerly Twitter) is the headline feature and nobody else has it at the same depth, so if you want a model answering with data from the last ten minutes instead of last year's training cutoff, that's the lane. The context window has pushed to around 2 million tokens on the top tier, which matters when you want to drop a big stack of documents in at once. DeepSearch pulls cited research reports, Grok Imagine handles short video and image generation (720p clips with audio as of early 2026), and there is a separate xAI for Government offering that is worth knowing about if an approved version shows up on your desk.

Gemini by Google

Best for Proofreading, short text tasks
Cost Varies, often bundled with Workspace
My take Meh

Gemini is... meh. I might be a little jaded because Gemini is the only model GenAI.mil actually offers right now, and that colors my view. When I've tested it on coding, it's fine. It doesn't hold up against Claude or ChatGPT on anything serious. It also feels like the slowest of the big four to adopt new capabilities, which matters a lot in a space where the other three are shipping meaningful updates on a near-weekly basis.

Where it's actually useful: proofreading an email, punching up a short paragraph, quick text-cleanup tasks. If that's the job you have for AI, Gemini does fine. Just don't expect it to carry the heavy stuff.

What's actually in the box. In fairness to Google, the standalone-model conversation is not the whole picture. Google just rolled out Workspace Intelligence, a semantic layer that threads Gemini through Docs, Sheets, Slides, Gmail, Drive, and Chat so the model can build new files using content and style pulled from your existing stuff. If you already live in Workspace, that changes the conversation, because the model shows up where the work already is rather than in a separate tab. It also plugs into Asana, Jira, and Salesforce through third-party connectors, and Google is explicit that Workspace data is not used to train the model, not reviewed by humans, and can be locked to US or EU processing. None of that flips my ranking of the raw model, but if your whole shop runs on Workspace, the integration story does real work.

GenAI.mil by the DoW

Best for What you're actually allowed to use at work
Cost Free inside the wire
My take Fell flat, for now

I was genuinely excited when GenAI.mil was announced with all four of the big models listed. It has fallen flat. Months in, the only one actually available is arguably the weakest of the four. I understand there are legal issues between Anthropic and the DoW that are slowing Claude's arrival, and that part isn't the platform's fault. The rest of the delay is harder to justify.

The broader concern. If the DoW can't roll out the initial platforms for these tools, how is it going to keep up with the advancements these companies ship nearly daily? Claude already has Cowork and can nearly autonomously build tools, apps, and papers. ChatGPT's coding and agent features will show up on GenAI.mil neutered, if they show up at all. By the time the government-approved version of any of this arrives, commercial tools will be three jumps ahead.

To be fair. The DoW genuinely can't flip on a Cowork-style platform tomorrow. Handing a model live access to folders, files, and documents sitting on a government machine is a huge security problem, and pretending otherwise would be reckless. That constraint is real. What the platform still needs is an honest plan to adapt as the commercial tools keep moving, because without one, the gap between what a CO can use at home and what a CO can use at work only gets wider.

I want this platform to succeed. We need a government-approved option, and I'm rooting for the people building it. Right now it is showing that the acquisition system running it cannot keep pace with the commercial landscape it is trying to deliver.

Suno.ai by Suno

Best for Generating actual music from lyrics
Cost Tiered, reasonable
My take Niche tool, great for its job

Suno is how the songs on ContractingFM get made. The workflow is a good example of how these tools work together: ChatGPT writes the lyrics, Suno turns them into an actual song, and Claude wrote the code for the music player that serves them up on the homepage. Three different AIs, one feature on the site.

No single tool wins every job. Building something cool usually means stitching a few together and knowing which one to hand each task to.

That is the relationship angle from the Introduction showing up in practice. Once you know what each model is good at, the question stops being "which AI do I use" and becomes "which AI do I use for this specific step?" That's where you get real leverage out of this stack.

Quick picker

Building tools, writing long docs, structured code
Claude
Research with sources pulled together
ChatGPT (Deep Research)
Image generation
ChatGPT
Brainstorming, stress-testing a take
Grok or Claude
Proofreading a short email
Gemini
Music generation
Suno (+ ChatGPT for lyrics)
Whatever the government lets you have at work
GenAI.mil