65 points by bx376 1 hour ago | | 55 comments

Well, this sounds like perfect tasks for GPT:

"Participants responded to a total of 18 tasks (or as many as they could within the given time frame). These tasks spanned various domains. Specifically, they can be categorized into four types: creativity (e.g., “Propose at least 10 ideas for a new shoe targeting an underserved market or sport.”), analytical thinking (e.g., “Segment the footwear industry market based on users.”), writing proficiency (e.g., “Draft a press release marketing copy for your product.”), and persuasiveness (e.g., “Pen an inspirational memo to employees detailing why your product would outshine competitors.”)."

Here is the GPT response to the first task: https://chat.openai.com/share/db7556f7-6036-4b3d-a61a-9cd253...

A confident GPT hallucination is almost indistinguishable from typical management consulting material...

reply

1) Your ideas are bad.

2) Spreadsheets exist.

3) No-one cares about your marketing copy.

4) No-one finds your c-suite babble inspirational.

This is almost perfect input to an LLM exactly because of how low value it is in the first place.

Several of those aren't even new.

> Two distinct patterns of AI use emerged: “Centaurs,” who divided and delegated tasks between themselves and the AI, and “Cyborgs,” who integrated their workflow with the AI.

It was nice of them to explain that the article was total horse dung before having to read the whole paper

reply

The article certainly didn't invent the term 'Centaur', but I haven't seen cyborg used in that way. It does seem a bit clickbaity.

I like Cory Doctorow’s use in “Chickenized Reverse Centaur”.

LLMs are stunningly good at language tasks: almost all of what us old-timers called NLP is just crushed these days. Summarization, Q&A, sentiment, the list goes on and on. Truly remarkable stuff.

And where there isn’t a bright line around “fact”, and where it doesn’t need to come together like a Pynchon novel, the generative stuff is smoking hot: short-form fiction, opinion pieces, product copy? Massive productivity booster, you can prototype 20 ideas in one minute.

But that’s about where we are: lift natural language into a latent space with some clear notion of separability, do some affine (ish) transformations, lower back down.

Fucking impressive for a computer. But if it can really carry water for an expensive Penn grad?

You’re paying for something other than blindingly insightful product strategy.

reply

Says more about how useless BCG consultants are.

reply

I’m starting to think there’s an LLM equivalent to the old saying about how everything the media writes is accurate except on the topics you’re an expert in. All LLM output looks to be good quality except when it’s output you’re an expert in.

People who have no background in writing or editing think LLMs will revolutionize those fields. Actual writers and editors take one look at LLM output and can see it’s basically valueless because the time taken to fix it would be equivalent to the time taken to write it in the first place.

Similarly people who are poor programmers or have only a surface level understanding of a topic (especially management types who are trying to appear technical) look at LLM output and think it’s ready to ship but good programmers recognize that the output is broken in so many ways large and small that it’s not worth the time it would take to fix compared to just writing from scratch.

LLMs are not worthless for programming. You just cannot expect it to ship a full programm for you, but for generating functions with limited scope, I found it very useful. How to make use of a new and common libary for example. But of course you have to check and test.

And for text I know people who use it succesfully (professionally) to generate texts for them as a summary from some data. They still have to proof read, but it saves them time, so it is valuable.

Programmers don't think that, though, or least not all the time.

You could say similar things about Stack Overflow, and yet we use it.

Stack Overflow responses are well known to be misranked. I’ve heard a rule of thumb that the actual correct answer is typically about #3.

Yep. ChatGPT is like having a junior engineer confidently asking to merge broken garbage into your codebase all the time. Adds negative value for anyone that knows what they’re doing.

Gell-Mann Amnesia!

[flagged]

https://news.ycombinator.com/newsguidelines.html

Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

Why the anger?

everyone you described share something in common.

they aren’t good at using language models.

Nor are 99.9% of humanity. I think that's the point.

Somewhat agree, I know LLM have boosted my programming output mostly in writing jsdocs and pr descriptions. The things I don't really like doing

If your PR descriptions can be generated off file diffs everyone's time could be better spent scanning the diff to come to the same conclusions.

Consider using them to capture the answers to the usual why questions which LLM won't be able to do.

Most management consultants are useless. But there are some realities you must accept.

Number 1. In a team of 20-30 engineers there is only one extremely god "why is he with us" engineers who is great at technical stuff and being a people person. However, no matter how nice he is his approach to his job, it is a job and I will only drop hints how the management should be done. He doesn't care about where the company is headed because he plays video games, has a family and has a literal life. He doesn't care about management and taking on undue responsibilities. Moreover, the people up to has a label for him as an "engineer" does not see as a "manager".

For the rest of the engineers and managers, have also adopted the approach of "not my problem", you see a bizarre communication gap. Engineers working closesly with the product don't want to talk to their managers, becase the conversation goes like "if you know this so much, why don't you.... <a description of something results in more work that goes outside their JD>" and managers don't want to talk with engineers because "if you are you so interested, why don't you.... <a description of something results in more work that goes outside their JD>"

From this progressive distance between managers and engineers comes the "manaegment consultant". Management consultant have the upper management given flexibility of going back and forth between engineers and managers. They can have conversations with full flexibility but they are not bound to "why don't you...." phrases. They can talk with anyone and submit a report and take home 1 years worth of salary of managers/engineers in 1 month.

The conversation gap between product and business where management consultants come in. And the funny thing is that, management consultants target those "I don't want to but I should" work things and report to the upper management. They can do this so well, because they are not burdened with the "work" part.

Seriously, if you do some introspection, you will see there is plenty of things you know your company should do, but you don't want to voice them because it results in more work and in fact more risk. There comes a "good" management consultant who will discover those things and report to upper management who will create the system to get those jobs done.

That is my pitch if anyone wants a management consultant hire me. I am going to tell them why their company sucks in 20 different ways with 18 of those points being generated by ChatGPT.

Needs an /s.

Says more about how people will parrot the same phrase over and over for anything at all. It's just funny how you can predict a comment like this in every thread regardless of what it does.

"It says more about [insert]" anytime GPT does something just makes the phrase lose all meaning. Surely you have something meaningful to say?

Often effortposts aren’t worth it because someone will come along and Gish Gallop the post with opaquely nonsensical bad-faith counterarguments that are a lot of work to refute.

I agree with you in an ideal world, but sadly this isn’t one.

Agreed: the study only shows that BCG consultant's work is 40% noise without real added value... I guess that customers should now ask for a 40% rebates !!! ;-)

Not surprised. It's frighteningly good, and a perfect match for programming.

I often ask GPT4 to write code for something, and try if it works, but I seldom copy and paste the code it writes - I rewrite it myself to fit into the context of the codebase. But it saves me a lot of time when I am unsure about how to do something.

Other times I don't like the suggestion at all, but that's useful as well, as it often clarifies the problem space in my head.

reply

The published article is not at all about programming tasks but about generating text for "strategy consultant".

Some example found page 10 of the original article:

   - Propose at least 10 ideas for a new shoe targeting an underserved market or sport.
   - Segment the footwear industry market based on users.
   - Draft a press release marketing copy for your product.
   - Pen an inspirational memo to employees detailing why your product would outshine competitors.
Nothing of real value imho.

I’ve also found the act of describing my problem to GPT4 is sometimes just a helpful as the answer itself. It’s almost like enhanced rubber duck debugging.

We need an inverse GPT4-style LLM that doesn't provide answers but instead asks relevant questions.

GPT4 can do that too. Just show it something (code or text) and ask it to ask coaching questions about it.

This is one step removed from "try different things until it works" style of programming.

Not to say you're of of those programmers, but it certainly enables those sorts of programmers.

And what's the harm in that? That's how I first started out programming decades ago.

HN is so bad at predictions. Just a few months ago HN was awash with comments that confidently claimed LLMs were no more than stochastic parrots and unlikely to amount to anything.

> I can't help but think the next AI winter is around the corner. [0]

Yeah, right.

[0] https://news.ycombinator.com/item?id=23886325

reply

Two things mentioned in the abstract that are worth pointing out.

> For each one of a set of 18 realistic consulting tasks within the frontier of AI capabilities

They specifically picked tasks that GPT-4 was capable of doing. GPT-4 could not do many tasks, so when we say that performance was significantly increased this is only for tasks GPT-4 is well suited to. There is still value here but let's put these results into context.

> Consultants across the skills distribution benefited significantly from having AI augmentation, with those below the average performance threshold increasing by 43% and those above increasing by 17% compared to their own scores

Even when cherry-picking tasks that GPT-4 is particularly suited for, above average performers only increased performance by 17%. This increase is still impressive, were it to be seen across the board. But I do think that 17% is a lot less than some people are trying to sell.

reply

You're underestimating because it compounds. Small gains in efficency lead to huge advantages in long term growth. 17% would be absolutely monumental improvement.

Having been a consultant, what strikes me about this is the next, to me seemingly obvious question: What if you just removed the consultants entirely and just had GPT-4 do the work directly for the client?

If you’re a client and need a consultant to do something, you have to explain the requirement to them, review the work, give feedback, and so forth. There will likely be a few meetings in there.

But if GPT-4 can make consultants so much better, I imagine it can also do their work for them. And if you combine this with the reduction in communications overhead that comes from not working with an outside group, why wouldn’t clients just accrue all the benefits to themselves, plus the benefit of not paying outside consultants or dealing with the overhead of managing them?

This is especially the case when the client is already a domain expert but just needs some additional horsepower. For example, marketing brand managers may work with marketing consultants even though they know their products and marketing very well. They just need more resources, which can come in the form of consultants for reasons such as internal head-count restrictions.

Anyway, I just wonder if BCG thought through the implications of participating in this study. To me it feels like a very short step from “helps consultants help their clients” to “helps clients directly and shows consultants aren’t really necessary.”

Especially so if the client just hires an intern and gives them GPT-4.

reply

Companies like BCG and McKinsey are mostly about liability, as a CEO you call them, pay them the big bucks, have them make up plans and strategies, if it works out you get the credit, if it doesn't then well "we worked tightly with experts from McKinsey, etc. so the blame isn't on me"

The frustrating one is when you've been telling management something for months (if not years), and the consultant comes in, and their report says what you been saying, and only then does the company finally do what you've been saying all along! Coulda saved the company 5-figures just listening to me. sigh politics.

Why is that frustrating? I find it validating

There is a lot of office work that will overtime be optimized over time using gpt like services. I was tech savvy enough to know that a lot of office work that I do is repeatable and can be done using scripts but not good enough to write those scripts myself. Using Chat gpt allowed me to write those scripts it took me I think 15-20hrs to get the scripts working perfectly. I knew just a little bit of python scripting did not know anything about python pandas or xls writer etc but was able to create something that saves me I would estimate 20-25 hours a week.

In my opinion a lot of people here on hackernews as they are themselves good at programing underestimate how services like chat gpt can open a new world to non programmers. They also probably make the non inquisitive learn less. Previously to learn how to stop multiple snapd services using a script I would have googled and then cobbled together something today I just ask chatgpt and get a working script in less than a min.

reply

Pipe /dev/random, transform to decimal, and you just got an amazing increase in performance for calculating decimals of Pi. Nobody said precision was important anyway.

reply

Honestly if you don't care about precision, /dev/zero is going to give you more throughput. Plus, I personally guarantee it's correct to within an error margin of 4.0. You can't offer the same with /dev/random!

Maybe this tells more about BCG consultants than its does about GPT-4?

reply

That's what you would like to think, isn't it? I'm afraid this would be just as much true with any other kind of subjects, and as far as I know, there's no evidence either way so this is just a cheap stab you're having at them.

After all the cheap stabs I had to take as a programmer... I allow myself to experience schadenfreude, even if there is no evidence...

The headline "GPT-4 increased BCG consultants’ performance by over 40%" is misleading since it implies that they became more productive in their actual work, when this is a carefully controlled study that separates tasks by an "AI frontier". Only inside the frontier did the "quality" of work increase by 40%, while they completed 12% more tasks on average.

reply

Exactly, and right in the introduction they even say:

> while AI can actually decrease performance when used for work outside of the frontier

There is some value here but the authors can define "frontier" however they please to end up with whatever productivity increase they are looking for.

This is a good thing, since increased perfomance means that the clients will have less billed hours, right? Right?

reply

No, it increases the load one can successfully manage in a day. There isn't this tiny discrete amount of work that people need to handle. We gave that up when we left the campfires. We're trying to grow.

Withing 3 months, companies will be applying to Y Combinator, where both Founder and CEO are ML Models... :-)

reply

So they don't check the results for the clients.

reply

BCG : We know layoffs are in fashion and we'd just like you to know that if you need industrial grade ass covering excuses from a legitimate-ish sounding authority to justify what you were planning to do anyway, our 23 year old consultants and their PowerPoint presentations have got you covered.

reply

What LLMs do for me is that they make me a pro in every programming language.

"How do I do x in language y" always gives me the knowledge I need. Within seconds, I can continue coding.

After more than 10 years of coding fulltime, I know some languages very well, like PHP and Javascript. But even in those, LLMs often come up with a better solution than what I wrote. Because they know every fricking thing about those languages.

reply