https://github.com/id-Software/Quake-III-Arena/blob/master/code/game/q_math.c#L552
copilot repeats it word for word almost, including comments, and adds an MIT like license up the top
I might have missed something, but that's the gist of it.
That's GPT-J-6B, to be clear. A 6-billion-parameter model is producing better output than a 300 billion parameter model, because of what I can only assume to be sheer incompetence on AI Dungeon's part. I've also used the raw GPT-3 API, and it does better at writing than either. In other words: Doing nothing would have been better than whatever they've been doing.
https://github.com/search?q=%22evil+floating+point+bit+level+hacking%22&type=code
A question that popped into my head is: if the machine sees the same exact block of code hundreds of times, does that suggest to it that it's more acceptable to regurgitate the entire thing verbatim? Not that this incident is totally 100% ok, but if it was doing this with code that existed in only a single repo that would be much more concerning.
From a copyright standpoint, quite possibly. This is called the "Scènes à faire" doctrine. If there are some things that have to be there in a roughly standard form to do a standard job, that applies.
If you and I write the exact same 10 lines of code, we both have independent and valid copyrights to it. Unlike patents, independent derivation of the same code _is_ a defense for copyright.
If I write 10 lines of code, publish it as GPL (but don't sign a CLA / am not assigning it to an employer), and then re-use it in an MIT codebase, I can do that because I retained copyright, and as the copyright holder I can offer the code under multiple incompatible licenses.
There's no way for a machine to detect independent derivation vs copying, no way for the machine to know who the original copyright holder was in all cases, and whether I have permission from them to use it under another license (i.e. if I email the copyright holder and they say 'yeah, sure, use it under non-gpl', it suddenly becomes legal again)...
It's not a problem computers can solve 100% correctly.
At a previous job we had a audit from them, it seemed to not be too accurate but probably good enough for companies to cover their asses legally.
So, the tool is worthless if you want to use it legally.
You can be almost certain it’s being widely used or will be widely used shortly.
The conversations around copilot are eerily similar to the conversations around the first autocomplete tools
I said that it is impossible for the user to check that the code copilot gives is OK, license-wise, and therefore, they can not be sure that it is legally OK to include in any project.
It's content generation at a fragmentary level where each "copied" chunk does not form a substantive whole in the greater body of the new work. Even if you were training it on other authors' works rather than just your own, as long as it wasn't copying distinctive sentences wholesale, I think there's a strong argument for it falling under fair use--if it's even detectable.
On the other hand, if it regurgitated somebody else's paragraph wholesale, I don't think that would be fair use. Somewhere in-between is where it gets fuzzy, and really interesting; it's also where internet commenters seem to prefer flipping over the board and storming out convinced they're right to exploring the issues with a curious and impartial mind. I see way too much unreasoned outrage and hyperbolic misrepresentation of the Copilot tool in these threads, and it's honestly kind of embarrassing.
As far as this analogy goes, it's worth noting that the structure of a computer program doesn't map onto the structure of a piece of fiction (or any work of prose) in a straightforward way. Since so much of code is boilerplate, I would (speculatively, in the copyright law sense) actually give more leeway to Copilot in terms of absolute length of copied chunks than I would for a prose autocompleter. For instance, X program may be licensed under the GPL, but that doesn't mean X's copyright holder(s) can sue somebody else because their program happened to have an identical expression of some RPC boilerplate or whatever. It would be like me suing another author because their work included some of the same words that mine did.
^[1] At least one tool like this (using GPT-3) has been posted on HN. At this point in time I wouldn't use it, but I have to admit that it was sort of cool.
Have a poke at novelai.net if you get a chance.
It's... not very smart. It's pretty decent at wordcrafting, though, and as an amateur writer I find it invaluable for busting writer's block. Probably if you spend all day writing fiction you'll find ways around that, but for me the solution has become "Ask the AI to try".
It'll either produce a reasonable continuation, or something I can look at and see why it's wrong. Either is better than a blank page.
The application itself is called "Sudowrite". I guess there are probably a bunch of them at this point.
"your honor, i would like to plead not guilty, on the basis that i just robbed that bank because i saw that everyone was robbing banks on the next city"
...on the other hand, that was the exact defense tried for the capitol rioters. So i don't know anything anymore.
But if we put the licensing to one side for a moment...
1/ Everything I've seen it generate so far is 'imperative hell'. It is practically a 'boilerplate generator'. That might be useful for pet projects, smaller code bases, or even unit-test writing. But large swathes of application code looking like the examples I've seen so far is hard to manage.
2/ The boilerplate is what bothers me the most (as someone who believes in the declarative approach to software engineering). The future for programming and programming languages should be an attempt to step up to a higher level of abstraction, that has been historically the way we step up to higher levels of productivity. As applications get larger and code-bases grow significantly we need abstraction, not more boilerplate.
3/ As someone who develops a functional framework for C# [1], I could see Copilot essentially side-lining my ideas and my approach to writing code in C#. Not just style, but choice of types, etc. I wonder if the fall out of what is Copilot's 'one true way' of generating code was ever considered? It appears to force a style that is at odds with many who are looking for more robust code. At worst it will homogenise code "people who wrote that, also wrote this" - stifling innovation and iterative improvements in the industry.
4/ Writing code is easy. Reading and understanding code written by another developer is hard. Will we spend most of our time as code-reviewers going forwards? Usually, you can ask the author what their intentions were, or why they think their approach is the correct one. Copilot (as far as I can tell) can't justify its decisions. So, beyond the simple boilerplate generation, will this destroy the art of programming? I can imagine many juniors using this as a crutch, and potentially never understanding the 'why'.
I'm not against productivity tools per se; it's certainly a neat trick, and a very impressive feat of engineering in its own right. I am however dubious that this really adds value to professional code-bases, and actively may decrease code quality over time. Then there's the grey area of licensing, which I feel has been totally brushed to one side.
We started building some services in F#, but still had a massive amount of C# - and so I wanted the inertia of my team to be in the direction of writing declarative code. There wasn't really anything (outside of LINQ) that did that, so I set about creating something.
We don't write F# any more and find functional C# (along with the brilliant C# tooling) to be very effective for us (although we also now use PureScript and Haskell).
I do have a stock wiki post on the repo for this though [1]. You might not be surprised to hear it isn't the first time I've been asked this :)
[1] https://github.com/louthy/language-ext/wiki/%22Why-don't-you-just-use-F%23%3F%22
That post in the wiki sums it up perfectly, much appreciated!
Copilot sounded terrible in the press release. The idea that a computer is going to pick the right code for you (from comments, no less) is really just completely nuts. The belief that it could be better than human-picked code is really way off.
You bring up a really important point. When you use a tool like Copilot (or copypasta of any kind), you are introducing the additional burden of understanding that other person's code -- which is worse than trying to understand your own code or write something correct from scratch.
I think you've hit the nail on the head. Stuff like Copilot makes programming worse and more difficult, not better and easier.
Copilot makes programming worse and more difficult if you're aiming for a specific set of coding values and style that Copilot doesn't generate (yet?). If Copilot generates the sort of code that you would write, and it does for a lot of people, then it's definitely no worse (or better) than copying something from SO.
The author of a declarative, functional C# framework likely has very different ideas to what code should be than some PHP developer just trying to do their day-to-day job. We shouldn't abandon tools like Copilot just because they don't work out at the more rigorous ends of the development spectrum.
In some cases the benefits of doing so outweigh the costs (such as using a stack overflow answer that's stood the test of time for something you don't know how to do), but with Copilot you don't even get the benefit of upvotes, human intent, or crowdsourced peer review.
Disagree.
Most SO copy-paste must be integrated into your project -- maybe it expects different inputs, maybe it expects or works with different variables -- whatever, it must be partially modified to work with the existing code-base that you're working with.
Copilot does the integration tasks for you. When one might have had to read through the code from SO to understand it enough to integrate it, the person using Copilot need not even invest that much understanding.
Because of these workflow differences, it seems to me as if Copilot enables an even more low-quality workflow than offered by copy-pasting from SO and patching together multiple code-styles and paradigms while hoping for the best; Copilot does that without even the wisdom that an SO user might have that 'this is a bad idea.'
1 - I don't expect "not using a tool that generates bad code" to be the top option.
Copilot does not understand the code in toto and is therefore really useless for debugging (70% of all coding) and probably useless for anything other than very simple parts of an app.
I don't think that's important. Copilot, at least as it's been demo'd so far judging by the examples, is to help you write small, standalone functions. It shouldn't need to know about the rest of the application. Just as the functions that you write yourself shouldn't need to know about the rest of the application either.
If your functions need a broad understanding of the codebase as a whole how the heck do you write tests that don't fail the instant anything changes?
Since that's where the work of programming is, debugging connected applications (not writing fresh, unencumbered code, a rare luxury), a tool that offers no help for that is, well, not much help.
For example, I wrote a comment along the lines of "Find the middle point of two 2D positions stored in x, y vectors" and it came up with two totally different approaches in Ruby - one of which I wouldn't have considered. I did some similar things with SQL, and some people might find huge value in it suggesting regexes, too, because so many devs forget the syntax and a reminder may be all it takes to get out of a jam.
I'm getting old enough now to see where these sorts of prompts will be a game changer, especially when dabbling in languages I'm not very proficient in. For example, I barely know any Python, so I just created a simple list of numbers, wrote a "Sort the numbers into reverse order" comment, and it immediately gave me the right syntax that I'd otherwise have had to Google, taking much longer.
Maybe to alleviate the concerns it could be sandboxed into a search engine or a separate app of its own rather than sitting constantly in my main editor - I would find that a fair compromise which would still provide value but require users to engage in more reflection as to what they're using (at least to a level that they would with using SO answers, say).
As a nudge, it's a great idea. As a substitute for vigilance, it's a terrible idea.
I suspect that's why they named it Copilot instead of Autopilot, but it's unfortunately more likely to be used as the latter, humans being humans.
But that does not seem to be it's advertised or configured purpose, sitting in your main editor.
I totally agree with you that prompted help is a big deal and just going to get bigger. We have developed a language for fact checking called MSL that works exactly this way in practice -- suggesting multiple options rather than just inserting things.
One of the things that interests me about this thread is the whole topic of UI vs. AI and how much help really comes from giving the user options (and a good UI to discover them) vs how much is "AI" or really intelligence. I think the intelligence has to belong to the user, but a computer can certainly sift through a bunch of code to find a search engine result and, those results could be better than you get now from Google &Co.
EDIT: they appear to be interested in making it look for similar code, see here: https://docs.github.com/en/github/copilot/research-recitation
It's really weird for software engineers to judge something by its current state and not by its potential state.
To me, it's clearly solvable by Copilot filtering the input code by that repository's license. It should only be certain open source licenses, maybe even user-choosable, or code-creators can optionally sublicense their code to Copilot in a very permissable way.
Secondly, a way for the crowd to code review suggestions would be a start.
Of course this would be massive, so from a practical consideration the attribution file that Copilot generates in the local repository would have to just link to the full file, but I don't think that would be an issue in and of itself.
Almost certainly a link would not suffice, basically every license requires that the attribution be directly included with the modified material. Links can rot, can be inaccessible if you don't have internet access, can change out from underneath you, etc.
(I am not a lawyer, btw)
Writing to overall goals and debugging actual behavior are the real work of programmers. Coming up with syntax or algorithms are 3rd and 4th on the priority list because, lets face it, it's not that hard to find a reference for correct syntax or the overall recipe implied by an algorithm. Once you understand those, you can write the correct code for your project.
I do think Copilot has potential as a search engine and reference tool -- if it can be presented that way. But the idea of a computer actually coming up with the right code in the full context of the program seems like fantasy.
Don't tell me what to do, tell me what not to do. "this line doesn't look like something that belongs in a code base", "this looks like a line of code that will be changed before the PR is merged". Etc.
No, we're not afraid of Copilot replacing us. The thought is ridiculous, anyway. If it actually worked, we would be enabled to work in higher abstractions. We'd end up in even higher demand because the output of a single engineer would be so great that even small businesses would be able to afford us.
Yes, we are afraid of Copilot making the entire industry worse, the same way that "low-code" and "no-code" solutions have enabled generations of novices to produce volumes of garbage that we eventually have to clean up.
I’m saying copilot can be better with very simple tweaks
Absolutely not, not at all. I'm suggesting that copying and pasting happens, particularly in the context of a single project.
> At least for your own code, how did you end up with two copies of duplicated logic rather than a shared library of functionality?
At what point is it worth introducing an abstraction rather than copying? Using my libcurl example, you can create an abstraction over the~ 10 lines of initialization, but if you need to change it to a POST, then you're just implemnenting an abstraction over libcurl, which is just silly.
As a thought experiment, I thought "what would happen if we trained it on our 15 million lines of product code + my language-ext project". It would almost certainly produce something that looks like 'us'.
But:
* It would also trip over a million or so lines of generated code
* And the legacy OO code
* It will 'see' some of the extreme optimisations I've had to built into language-ext to make it performant. Something like the internals of the CHAMP hash-map data-structure [1]. That code is hideously ugly, but it's done for a good reason. I wouldn't want to see optimised code parroted out upfront. Maybe it wouldn't pick up on it, because it hasn't got a consistent shape like the majority of the code? Who knows.
Still, I'd be more willing to allow my team to use it if I could train it myself.
[1] https://github.com/louthy/language-ext/blob/main/LanguageExt.Core/DataTypes/TrieMap/TrieMap.cs#L1032
Aside from OO vs FP. A concern with that I'd have is that it would encourage and enforce idiosyncracies in large corporate codebases.
If you've ever worked for a large corporation on their legacy code, you know you don't want any of that to be suggested to colleagues.
This would enforce bad behaviors and make it even harder for fresh developers to argue against it.
I think this is a significant point. It maintains the status quo. We change our guidance to devs every other year or so. New language features become available, old ones die, etc. But we're not rewriting the entire code-base every time, we know if we hit old code, we refactor with the new guidance; but we don't do it for the sake of it, so there's plenty of code that I wouldn't want in a training set (even if I wrote it myself!)
It would be interesting how much code you would need before it was useful (and how good does it have to be to be useful? Does even a small error rate cost so much that it erases other gains, because so many of the potential errors in usage of this type of tool are very subtle?)
6/ Copilot learns from the past. It can only favor popularity and familiarity in code patterns over correctness and innovation.
Just the other day someone on copilot threads was arguing that this kind of boilerplate optimizes for readability... It's like Java Stockholm syndrome and the old myth of easy to approach = easy to read (how long it took them to introduce var).
I've always viewed code generators as a symptom of language limitations (which is why they were so popular in Java land) that lead to unmaintainable code, this seems like a fancier version of that - with all the same drawbacks.
I think Copilot is so unfortunate because it's not building abstractions and expecting you to override parts of them. It's acting as an army of monkeys banging out Shakespeare on a typewriter. And the code it generates is going to require an army to maintain.
For example I think F# idea of type providers > code generators.
I understand why some corporations prefer dumb boilerplate everywhere for some applications. If there is an outage it's usually easy to fix quickly. Sometimes it's not, if it's a issue in the boilerplate (say, Feb 29 rolls around and all of the boilerplate assumed a 28 day month) that means a huge update all across the system, but that rarely happens in practice.
I do agree on the debugging aspect - especially in dynamic languages - metaprogramming stack traces can be really hard to follow.
But I wonder...
Is this a difference of programmer culture?
I think there are people who write successful computer programs for successful businesses without delving into the details. Without considering all the things that might go wrong. Without mapping the code they're writing to concepts.
Lots of people.
What would they do with this?
Not get a job working for me ;)
More seriously, when I think back to when I was first learning programming - in the heady days of 1985 - I would often copy listings out of computing magazines, make a mistake whilst doing it, and then have no idea what was wrong. The only way was to check character by character. I didn't have the deeper understanding yet, and so I couldn't contribute to solving the problem in any real way.
If they're at that level as a programmer, to the point where their code is being written for them and they don't really understand it, then they're going to make some serious mistakes eventually.
If you want to step up as a dev, understanding is key. Programming is hard and gets harder as you step up and bite off bigger and more complex problems. If you're relying on the tools to write your code, then your job is one step away from being automated. That should be enough to light a fire under your ambition!
Yes, better layers of abstraction could make us more productive in the future, but we're not there yet. By all means, don't accept the larger blurbs it proposes, but there is productivity to be gained in the smaller suggestions. If it correctly intuits the exact rest of the line that you were thinking of, it will save time and not make you lose understanding of the program.
In some areas complete understanding and complete code ownership is required but in a lot of places, it's not. If it produces the work of a moderately skilled developer it would be sufficient. I don't remember all code I write as time passes. If it produces work that I would have produced, then I don't see how that's any different that work that was produced by my past self.
It may feel offensive but a lot of the comments against it sound like rage against the machine/industrialization opponents and the arguments sound pretty similar to those made in the past by those that had their jobs automated away. I'm not sure we're all as unique snowflakes as we like to think we are. Sure, there will be some code that requires an absolute master that is outside the capabilities of this tool. But I'd guess there is a massive amount of code that doesn't need that mastery.
[1] https://docs.github.com/en/github/copilot/research-recitation
For small snippets that have likely been already written by someone else, this probably works great. For those though, the time savings is probably at most 5-10 min down to 1 or less. The challenge is that that’s not where my time goes unless I’m working in an unfamiliar language.
As someone who writes a lot of code quickly, I’m usually bottlenecked by reviews. For more complex changes I’m bottlenecked by understanding the problem and experimenting with solutions (and then reviews, domain-specific tests usually, fixing bugs etc). Writing code isn’t like waiting for code to compile since I’m not actually ending up task switching that frequently.
This does sound like a fantastic tool when I’m not familiar with the language although I wonder if it actually generates useful stuff that integrates well as the code gets larger (eg can I say “write an async function that allocates a record handle in the file with ownership that doesn’t outlive the open file’s lifetime”). I’m sure though that this is what a lot of people are overindexing on. For things like that I expect normal evolution of the product will work well. For things like “cool, understand your snippets but also weight my own codebase higher and understand the specifics of my codebase”, I think there’s a lot of groundbreaking research that would be required. That is what I see as a true productivity boost - I’d make this 100% required for anyone joining the codebase. The more mentorship can be offloaded, the lower the cost is to growing teams. OSS projects can more easily scale similarly.
I'd be more interested in a tool that notices patterns and boilerplate. It could offer a chance for generalization, abstraction or use of a common pattern from the codebase. This is of course much harder.
Adding abstraction buries complexity. If all you do is keep adding more abstractions, you end up with an overcomplicated, inefficient mess. Which is part of why application sizes are so bloated today. People just keep adding layers, as long as they have room for more of them. Everything gets less efficient and definitely not better.
The right way to design better is to iterate on a core design until it cannot be any simpler. All of the essential complexity of software systems today comes from 40 year old conventions. We need a redesign, not more layers.
One example is version management. Most applications today can implement versioned functions and keep multiple versions in an application, and track dependencies between external applications. Make a simple DAG of the versions and let apps call the versions they were designed against, or express what versions are compatible with what, internally. This would make applications infinitely backwards-compatible.
The functionality exists right now in GNU Libc. You can literally do it today. But rather than do that, we stumble around replacing entire environments of specific versions of applications and dependencies, because we can't seem to move the entire industry forward to new ideas. Redesign is hard, adding layers is easy.
Presumably you're writing code in binary then? This is a non-argument, because there's evidence that it's worked. Computers were first programmed with switches and punch cards, then tape, then assembly, then low level languages like C, then memory managed languages etc.
Abstraction works when side-effects are controlled. Composition is what we're after, but we must compose the bigger bits from smaller bits that don't have surprises in. This works well in functional programming, a good example would be monadic composition: monads remove the boilerplate of dealing with asynchrony, value availability, list iteration, state management, environment management, etc. Languages that have first-class support for these tend to have significantly less boilerplate.
The efficiency argument is also off too. Most software engineering teams would trade some efficiency for more reliable and bug free code. At some point (and I would argue we're way past it) programs become too complex for the human brain to comprehend, and that's where bugs come from. That's why we're overdue an abstraction lift.
Tools like Copilot almost tacitly agree, because they're trying to provide a way of turning the abstract into the real, but then all you see is the real, not the abstract. Continuing the assault on our weak and feeble grey matter.
I spent the early part of my career obsessing over performance on crippled architectures (Playstation 3D engine programmer). If I continued to write applications now like I did then, nothing would go out the door and my company wouldn't exist.
Of course there are times when performance matters. But the vast majority of code needs to be correct first, not the most optimal it can be for the architecture.
Like I say, our designs are ancient and lack features; we need to add more stuff to the code. But that will enable us to remove abstractions that were added only because our previous designs were crap.
No. GitHub Copilot tries to understand your intent and to generate the best code it can, but the code it suggests may not always work, or even make sense. While we are working hard to make GitHub Copilot better, code suggested by GitHub Copilot should be carefully tested, reviewed, and vetted, like any other code. As the developer, you are always in charge.
EDIT: the text above is a direct quote from the Copilot website
Naively, as someone who just heard of this - that sounds worse than useless. If you can't trust its output and have to verify every line it produces and that the combination of those lines does what you wanted, surely it's quicker just to write the code yourself?
Basically, in order to fall victim for this "code theft" (or any other "footguns" from Twitter threads) you'd need to be actively working against all the best practices and common sense. If you actually use it as a productivity tool (the way it is marketed) you'll remain in full control of your code.
By typing “fast inverse square root”? That is hardly an outlier or manipulating the machine
Problem is you don't know whose code you're stealing, which leads to all sorts of legal, security, and correctness issues.
I thought there was something more going on with copilot, but the fact that it is regurgitating arbitrary code comments tells me that there is zero semantic analysis going on with the actual code being pulled in.
Comments, I suspect, will be more likely to be memorized since the model would be trained to make syntactically correct outputs, and a comment will always be syntactically correct. That would mean there is nothing to 'punish' bad comments.
No, overfitting is the normal state for neural nets.
Understanding would mean to have an internal representation related to the intention of the user, the expected behavior, and say the AST of the code. My pessimistic interpretation of this and many other recent AI applications is that it is a "better markov chain".
I mean, if you wrote an autocomplete system for written english and asked it to complete the sentence "O Romeo, Romeo" what would you expect to happen?
You'd expect it to complete to "O Romeo, Romeo, wherefore art thou Romeo?" - a very famous quote.
How else could you produce the single right output for that unique input, other than memorising and regurgitating?
What about completing it to "O Romeo, Romeo, brave Mercutio is dead", based on the context, as advertised?
Why was this direction chosen? Is the inclusion of GPL really worth the risk and potential Google v. Oracle lawsuit? I’d like to know the reasoning.
They could try to trace every single code snippet they train on to its "true source" and use the license for that, but that's not very well-defined, and is a lot harder, and it's never going to be 100%.
> Once, GitHub Copilot suggested starting an empty file with something it had even seen more than a whopping 700,000 different times during training -- that was the GNU General Public License.
To use the old trope — if the majority of programmers can't implement Fizzbuzz, but they do have a Github profile, are they being included too?
Hopefully there's some quality bar for the training set, i.e. some subset of "good" code (e.g. release candidate tags from fairly established OSS tools/frameworks in different languages) rather than any old code on the internet.
That's more Trainee than Copilot.
When a person gets code from another source on the internet, they generally know where the code has come from.
What you get paid is to write your own code. When you write your own code, generally you think first and then type. Well, with Copilot you think first and then start typing a few symbols before seeing automatic suggestions. If they are right, you accept changes and if they happen to be similar to any other code out there, you deal with it exactly the same as if you typed those lines yourself.
If you happen to type code that is similar to copyright code, that is generally considered legally OK.
If you copypaste copyrighted code, that is not legally OK.
If you accept that same code from an autocomplete tool, that can easily be seen as equivalent to the latter case rather than the former.
Copilot isn't copying, it's regurgitating patterns from its training dataset. The result may be subject to a license you don't know about, but modified enough that you won't find the original source. The result can be a blend of multiple snippets with varying licenses. And there's no way to extract attribution from Copilot - DNN models can give you an output for your input, they can't tell you which exact parts of the training dataset were used to generate that output.
Btw the GPLv2 death penalty is rather unique and I don't think anyone will deny that including GPL code in proprietary code is a hell of a lot worse in every way (liability, ethically, etc) than including permissively licenced code and forgetting to attribute it
Excluding GPL does not solve the problem.
Put up repos with snippets for things people might commonly write. Preferably use javascript so you can easily "prove" it. Write a crawler that crawls and parses JS files to search for matching stuff in the AST. Now go full patent troll, eh, i mean copyright troll.
2) AGPL all that code.
3) Search for large chunks of code very similar to yours, but written after yours, licensed more liberally than AGPL. Ideally in libraries used by major companies.
4) Point the offenders to your repos and offer a "convenient" paid dual-license to make the offenders' code legal for closed-source use, so they don't have to open source their entire product.
5) Profit?
2- …
3- profit
Hard to say how straightforward it'd be to get it to produce consistently vulnerable suggestions that make it into production code, but I imagine an attacker with some resources could fork a ton of popular projects and introduce subtle bugs. The sentiment analysis example on the Copilot landing page jumped out to me...it suggested a web API and wrote the code to send your text there. Step one towards exfiltrating secrets!
Never mind the potential for plain old spam: won't it be fun when growth hackers have figured out how to game the system and Copilot is constantly suggesting using their crappy, expensive APIs for simple things!? Given the state of Google results these days, this feels like an inevitability.
//Implement eliptic cryptography below
//Sanitize input for SQL call below
Etc.
I certainly wouldn't want to be using this with languages like PHP (or even C for that matter) with all the decades of problematic code examples out there for the AI to learn from.
[0]: https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overview_of_the_code
This is not wrong, but it's easy to misread it as implying little more than a glorified Markov model. If it's like https://www.gwern.net/GPT-3 then it's already significantly cleverer, and so you should expect to sometimes get the kind of less-blatant derivation that companies aim to avoid using a cleanroom process or otherwise forbidding engineers from reading particular sources.
As far as licenses go, idk. Presumably it could delete associated comments and change variable names or otherwise obscure where it's taking code from. Maybe this part is shady.
Let's say that it's doing exactly what it was trained to do.
As far as licenses go, idk. Presumably it could delete the number plate and repaint the car or otherwise obscure where it's taking the car from. Maybe this part is shady.
Maybe.
This is an interesting issue. I suspect training on datasets from places like Github would be likely to provide lots of "this is a neat idea I saw in a blog post about how they did things in the 90's" codes.
So I don't know if on this alone it proves Copilot regurgitates too much. I think other signs are more troubling, however, such as its tendency to continue from a prompt vs generate novelty.
Just keep in mind that it's a statistical tool. You can't really formally prove that it won't memorise, but I think with enough work you can get it unlikely enough that it won't matter. It's their first iteration.
Dumb.
This claim that "AI" only means artificial general / human-equivalent intelligence completely ignores the long history of how that term has been used, by computer science researchers, for the last 70-odd years, to include everything from Shannon's maze-solving algorithms, to Prolog-y systems, to simple reinforcement learning, and so on.
It's true that there has been linguistic drift in the direction of the definition getting narrower (to the point where it's a joke that some people use 'AI' to mean whatever computers can't do _yet_). And you can have reasons to prefer your own very-narrow definition. But claiming that your own definition is the only valid one to the point that anyone using a wider definition (one that has a long etymological history, and which remains in widespread use) are "dumb" is... not how language works.
The clever OpenAI marketing hype squad on HN and Twitter know that they are re-selling a snake oil contraption. This 'thing' completely needs assistance from a human since it is producing insecure code, code that is also copyrighted and most of the times garbage from other sources, which is again totally dangerous.
Just look at this [0] Do a simple 'typo' in the signature and the whole implementation is wrong.
I have to say that OpenAI, GitHub and Microsoft are very clever in selling this scam to engineers who use the code produced by this contraption as 'safe to use' in their projects; especially since GPT-3 still cannot explain why it is generating the code its generating, or if the code is under a license that is non-commercial or under a restrictive licence.
No thanks and most certainly no deal.
One has the issue with form encoding: https://news.ycombinator.com/item?id=27697884
The python example is using floats for currency, in an expense tracking context.
The golang one uses a word ("value") for a field name that's been a reserved word since SQL-1999. It will work in popular open source SQL databases, but I believe it would bomb in some servers if not delimited...which it is not.
The ruby one isn't outright terrible, but shows a very Americanized way to do street addresses that would probably become a problem later.
And these are the hand picked examples. This product seems like it needs some more thought. Maybe a way to comment, flag, or otherwise call out bad output?
Everyone's self-preservation instincts kicking in to attack Copilot is kinda amusing to watch.
Copilot is not supposed to produce excellent code. It's not even supposed to produce final code, period. It produces suggestions to speed you up, and it's on you to weed out stupid shit, which is INEVITABLE.
As a side note, Excel also uses floats for currency, so best practice and real world have a huge gap in-between as usual.
Copilot is definitely no replacement for anything except copying from Stack Overflow for juniors.
But in the long run, AI is us basically us creating our own replacement. As a species. We don't realize it yet. It'll be really funny in retrospective. Too bad I probably won't be alive to see it.
Because if you don't realize this, you might be introducing GPL'ed code into your propiertary code base, and that might end up forcing you to distribute all of the other code in that code base as GPL'ed code as well.
Like, I get that Copilot is really cool, and that software engineers like to use the latest and bestest, but even if the code produced by Copilot is "functionally" correct, it might still be a catastrophic error to use it in your code base due to licenses.
This issue looks solvable. Train 2 copilots, one using only BSD-like licensed software, and one using also GPL'ed code, and let users choose, and/or warn when the snippet has been "heavily inspired" by GPL'ed code.
Or maybe just train an adversarial neural network to detect GPL'ed code, and use it to warn on snippets, or...
Fixed that for you.
Verbatim isn't the problem / solution. If you take a GPL'ed library and rename all symbols and variables, the output is still a GPL'ed library.
Just seeing the output of GPL'ed code spitted by copilot and writing different code "inspired" by it can result in GPL'ed code. That's why "clean room"s exist.
Copilot is going to make for a very interesting to follow law case, because probably until somebody sues, and courts decide, nobody will have a definitive answer of whether it is safe to use or not.
* Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
* ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
In over a decade of software engineering, I've seen many reuses of Stack Overflow content, occasionally with links to underlying answers. All Stack Overflow content use I've seen would clearly fail the legal terms set out by the license.
I suspect Copilot usage will similarly fail a stringent interpretation of underlying licenses, and will similarly face essentially no enforcement.
The license lets you modify the program, but the copyright still enforces that you can't copy/past code from it to your own project no?
Ruby on Rails was advertised as so simple, startup founders who can't program were making their entire products in it in a few days, with zero experience. As if.
It's easier to write correct code than to fix buggy code. For the former you have to understand the problem, for the latter you have to understand the problem, and a slightly off interpretation of it.
It's still problematic, but the defaults and handling there avoid some issues. So, for example:
Excel: =1.03-.42 produces 0.61, by default, even if you expand out the digits very far.
Python: 1.03-.42 produces 0.6100000000000001, by default.
In practice a double is 15.6 digits precise, which Excel rounds to 15 to eliminate some weirdness.
In their documentation they do cite their number type as 15 digit precision type. Ergo that's the semantic they've settled on.
My suggestion was a way to comment or flag, not to kill the product. These were particularly notable to me because someone hand-picked these 4 to be the front page examples of what a good product it was.
Nobody is threatened by this, assuredly. As with IDEs giving us autocomplete, duplication detection, etc this can only be helpful. There is an infinite amount of code to write for the foreseeable future, so it would be great if copilot had more utility.
Dumb question, but what is the proper way to handle currency? Custom number objects? Strings for any number of decimal places?
Or if your language has something specific built in, use that.
Unless your language is PostgreSQL's dialect of SQL, apparently. https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_money
Note that if you use cents in the US so that everything is an integer then as long as you do not have to deal with amounts that are outside the range [-$180 trillion, $180 trillion] you can also use double. Double can exactly represent all integer numbers of cents in that range.
This may be faster than int64 on some systems, especially on systems that do not provide int64 either in hardware or in the language runtime so you'd have to do it yourself.
Also I'm hard pressed to come up with a case where floats would work. Can you give an example?
And the downstream parts for trade confirmation ("Middle Office"), settlement and accounting ("Back Office") used fixed precision. Because they are fundamentally accounting, which involves adding things up and cross-checking totals.
These two parts have a very clear boundary, with strictly defined rounding rules when the floating point risk/trading values get turned into fixed point accounting values.
The answer is the same as _any_ time you should use floats: where you don't care about answers being exact, either (1) because calculation speed is more important than exactness, or (2) because your inputs or computations involve uncertainty anyway, so it doesn't matter.
This is more likely to be the case in, say, physics than it is in finance, but it's not impossible in the latter. For example, if you are a hedge fund and some model computes "the true price of this financial instrument is 214.55", you certainly want to buy if it's being sold for 200, and certainly don't if it's being sold for 250, but if it's being sold for 214.54, the correct interpretation is that _you aren't sure_.
When people say "you should never use floats for currency", their error is in thinking that the only applications for currency are in accounting, billing, and so on. In those applications, one should indeed use a decimal type, because we do care about the rounding behavior exactly matching human customs.
The arbitrary precision decimal type should be the default answer for currency until it is shown that the requirements no and at no time in the future will ever require fractional units of the smallest denomination.
As an aside, this may be constrained by the systems that the data is persisted into too... the Buffett Overflow is a real thing ( https://news.ycombinator.com/item?id=27044044 ).
[Modern] programming languages have decimal/rational data types, which (within limits) are exact. Where this is not possible, and/or it's undesirable for any reason, just use an int and scale it manually (e.g. 1.05 dollars = int 105).
However, point 2 is very problematic and important to consider. How do account 3 items that cost 1/3$ each (e.g. if in a bundle)? What if they're sold separately? This really depends on the requirements.
My 20 cents: if you start a project, start storing currency in an exact form. Once a project grows, correcting the FP error problem is a big PITA (assuming it's realistically possible).
My last job they wanted me to invoice them hours worked, which was some number like 7.6.
This number plays badly when you run it through GST and other things - you get repeaters.
So I looked up common practice here, even tried asking finance who just said "be exact", and eventually settled on that below 1 cent fractions I would round up to the nearest cent in my favour for each line item.
First invoice I hand them, they manually tally up all the line items and hours, and complain it's over by 55 cents.
So I change it to give rounded line items but straight multiplied to the total - and they complain it doesn't match.
Finally I just print decimal exact numbers (which are occasionally huge) and they stop complaining - because excel is now happy the sums match when they keep second guessing my invoices.
All of this of course was irrelevant - I still had to put hours into their payroll system as well (which they checked against) and my contract specifically stated what my day rate was to be in lieu of notice.
So how should you do currency? Probably in whatever form that matches how finance are using excel, which does it wrong.
BECAUSE EXCEL SUCKS MY DUDE.
What's notable is clearly no one had actually thought this through at a policy level - the answer was "excel goes brrrr" depending on how they want to add up and subtotal things.
I guarantee nothing in anyone’s time accounting system is measured to double-precision accuracy. Or at least, I’ve never quite figured out the knack myself for stopping work within a particular 6 picosecond window.
This caveat is kind of funny, in light of COBOL having support for decimal / fixed precision data types baked directly into the language.
It's not a problem with "non-modern" languages, it's a problem with C and many of its successors. That's precisely why many "non-modern" languages have stuck around so long.
https://medium.com/the-technical-archaeologist/is-cobol-holding-you-hostage-with-math-5498c0eb428b
Additionally, mainframes are so strongly optimized for hardware-accelerated fixed point decimal computing that for a lot of financial calculations it can be legitimately difficult to match their performance with standard commercial hardware.
Not really. Any semi-decent modern language allows the creation of custom types which support the desired behavior and often some syntactic sugar (like operator overloading) to make their usage more natural. Take C++, for example, the archetypal "C successor": It's almost trivial to define a class which stores a fixed-precision number and overload the +, -, *, etc. operators to make it as convenient as a built-in type, and put it in library. In my book, this is vastly superior to making such a type a built-in, because you can never satisfy everyone's requirements.
Okay, no idea how that's relevant to "built-in decimal types" vs "library-defined decimal types", but if it makes you feel better, you can do the same in Rust or Python, two languages which are "modern" compared to COBOL, don't inherit C's flaws, and which enable defining custom number types/classes/whatever together with convenient operator overloading.
Again, how is that relevant? If there's no way to enforce an invariant in custom data types, then there's also no way to enforce invariants in code using built-in data types.
Rust provides the mechanisms to enforce them, while in Python, like all dynamic languages, everything is up for grabs.
[1] (but was to lazy to write out)
You never account for fractional discrete items, it makes no sense. A bundle is one product, and a split bundle is another. For products sold by weight or volume, it's usually handled with a unit price, and a fractional quantity. That way the continuous values can be rounded but money that is accounted for needs not be.
I mean, fixed point and a specific type for currency (which also should include the denomination, while we are at it) are not rocket science. Spreadsheets get that right, at least.
Rounding error doesn't matter on these types of financial applications. It's the less glamorous accounting work that has to bother with that.
They're not rocket science, but they're unnecessary, and would still be off anyway. Try and calculate compound interest with your fixed point numbers.
The models also use transcendental functions which cannot be accurately calculated with fixed point, rationals, integers etc.
In accounting there are specific rules that require decimal system, so one must be very careful with the floating point if it is used.
I’d just add that if you are building a price prediction model, floats are probably what you need.
Disclaimer: I only work with currency in hobby projects.
My most obnoxious and spicy programming take is that ints an decimals should be built-in and floats should require imports. I understand why though: Decimal encoding isn't anywhere near as standardized as other numeric types like integers or floating-point numbers.
[1] https://docs.python.org/3/library/decimal.html [2] https://docs.python.org/3/library/json.html
I don't care about making inexact numbers require imports, but the most natural literal formats should produce exact integers, decimals, and/or rationals.
This is the better default, so I'd ditch the qualifier, personally. At the very least when it comes to the persistent storage of monetary amounts. People often start out thinking that they won't need arbitrary precision until that one little requirement trickles into the backlog...
Arbitrary precision rationals handles all the artithmetic you could reasonably want to do with monetary amounts and it lets you decide where to round at display time (or when generating a final invoice or whatever), so there's no information loss.
https://money.howstuffworks.com/personal-finance/financial-planning/stock-market-use-fractions.htm
Googler, opinions are my own. Over in payments, we use micros regularly, as documented here: https://developers.google.com/standard-payments/reference/glossary#micros
GCP on there other hand has standardized on unit + nano. They use this for money and time. So unit would 1 second or 1 dollar, then the nano field allows more precision. You can see an example here with the unitPrice field: https://cloud.google.com/billing/v1/how-tos/catalog-api#getting_the_list_of_skus_for_a_service
Copy/paste the GCP doc portion that is relevant here:
> [UNITS] is the whole units of the amount. For example if currencyCode is "USD", then 1 unit is one US dollar.
> [NANOS] is the number of nano (10^-9) units of the amount. The value must be between -999,999,999 and +999,999,999 inclusive. If units is positive, nanos must be positive or zero. If units is zero, nanos can be positive, zero, or negative. If units is negative, nanos must be negative or zero. For example $-1.75 is represented as units=-1 and nanos=-750,000,000.
The usual is to use decimal numbers with fixed precision (the actual precision varies from one country to another), and I don't know of any modern exception. But as late as the 90's there were non-decimal monetary systems around the world, so if you are saving any historic data, you may need something more complex.
In python, for exact applications (not many kinds of modeling, where floats are probably right), decimal.Decimal is usually the right answer, but fractions.Fraction is sometimes more appropriate, and if you are using NumPy or tools dependent on it, using integers (representing decimals multiplied by the right power of 10 to get the minimum unit in the ones position) is probably better.
There's plenty of good advice in this subthread for how to represent currency inside your Money abstraction, but whatever you do, keep it hidden. If you pass around numbers as currency values you will be in for a world of pain as your application grows.
Wait for your colleagues to use it, fix the bad code in the pull request, and wait for copilot to learn from the new training data you just provided!
If snippets are a legal problem, then Copilot is problematic by default, since it suggests code that may or may not be sourced from free software.
Putting GPL code in proprietary codebase would cause a company massive headaches...
So I agree copilot is problematic by default, liability to lawsuits for employers and forced open sourcing, liability to IP lawsuits as well which will end up on employees shoulders.
But my argument was that it's good enough developers may get complacent and not review the auto complete closely enough. But maybe I'm wrong! Maybe it's not that good yet.
A copilot for copilot? :)
In their defense they created the table with this column before invoking the autocomplete, so they sort of reap what the sow here.
It could at least auto-quote the column names to remove the ambiguity, but it's not a compiler, is it.
We know you can't use StackOverflow upvotes. However, they should have enough signal to identify what snippets of code have been most frequently copy-pasted from one project to another.
Question is whether that serves as a good proxy for good code identification.
How would a model become aware of all of the various edge cases that depend on which SQL database you use or differences in language versions over time?
It can't be, because they've chosen to use a deep learning approach. That makes it a dead end right from the start.
> How would a model become aware of all of the various edge cases that depend on which SQL database you use or differences in language versions over time?
A lot of things that we call "edge cases" are only a problem for humans. They're not "edge cases" from the point of view of the grammar / semantics of programming languages and libraries. The way a hypothetical, better Copilot could work, is by having directly encoded grammars and semantics metadata corresponding to popular languages and tools. It could generate code in principled and introspectable way, by having a model of the computation it wants to express and encoding it in a target language.
Of course, such hypothetical Copilot is a harder task - someone would have to come up with a structure for explicitly representing understanding of the abstract computation the user wants to happen, and then translate user input into that structure. That's a lot of drudgery, and from my vague understanding of the "classical" AI space, there might be a bunch of unsolved problems on the way.
Real Copilot uses DNNs, because they let you ignore all that - you just keep shoving code at it, until the black-box model starts to give you mostly correct answers. The hard work is done automagically. It makes sense for some tasks, less for others - and I think code generation is one of those things where black-box DNNs are a bad idea.
But that sounds like too much work, let's just throw a lot of data into an NN and see what comes out! /s
> and introspectable
Which most importantly means "debuggable", I assume. From what I get there doesn't seem to be any way to ad-hoc fix an NN's output.
The currency one I learned a while back, but it's not like I intuited using integers by default.
Value being a reserved keyword, I'm not sure I'd know that and I do Postgres work as part of my myriad duties at the startup I work at. Maybe I'd make that mistake in a migration, maybe I have already.
In a way, is it much different then what we do now as engineers? I'm hard pressed to call it much of an engineering discipline considering most teams I work on barely do design reviews before they launch in to writing code, documentation and meeting minutes are generally an afterthought, and the code review process while decent isn't perfect either and often times relies on arcane knowledge derived over months and years of wrangling with particular <framework, project, technology>.
It's pretty neat, presumably it'll learn as people correct it, and it'll get better over time. I mean it's not even version one.
I get the concerns, but I think they're a bit overblown, and this'll be really useful for people who want to learn how to code. Sure they'll run into some bugs, but, I mean, they were going to do that anyways.
This kind of tool will only further entrench the production of mediocre, bug-ridden code that plagues the world. As implemented, this will not be a solution; it is a express lane in the race to the bottom.
All the steps in between - looking at the docstring for the function you're calling, googling for more general information, looking at and deciding not to use not-applicable or poorly-written SO answers - get pushed aside. So instead of you having to convince yourself "yes, it's safe to copy-paste these lines from SO, they actually fit my problem" you're presented with magic and I think the burden for rejecting it is going to be higher once it's in your editor than when you're just reading it on a SO post or Github snippet.
Even for a newcomer looking to learn, working on simple stuff that it has great completions for, it seems like it will sabotage your long-term growth, since it takes all the why and the reasoning out of it. Autocomplete for a function name isn't that relevant to gaining a deeper understanding. Knowing why a certain block of code is passed in in a certain style, or needs to be written at all? Probably that is.
* some poor bastard is going to have to be the first person to figure out how to do something, so that copilot itself can know
* any non-code nuances around "oh, if you do that, your memory usage is going to explode" or "oh, by the way, if you do that, make sure you don't do your own threading" will still fail to be communicated.
On the upside, think of the consultancy fees you can charge to clean up those messes.
On the flip side, coding can be the bottleneck for the worst kind of coder. When I first started coding, coding was hard simply because I had very little reps and was just learning to understand how to code common solutions, data structures, libraries, etc. Fast forward a few years and, if I were still struggling to understand these concepts, Copilot is a lifeline.
I admit that at many organizations there are so many other factors and bottlenecks, but it’s not uncommon that I find myself 8+ hours deep into a coding task that I had expected would be much shorter.
On the other hand, usually that’s due to refactoring or otherwise not being satisfied with the quality of my initial solution, so copilot probably wouldn’t help…
I want to see hard statistics, not 4 hand-picked examples.
You can't just release a ML tool onto the public if you haven't validated it first.
As someone who has been coding up address storage and validation for the past week in my current job, that one really made me laugh. Mostly because it tries to simplify all the stuff I have been analyzing and mulling over for a week into a single auto-complete.
Spoiler: The Github Copilot's solution simply won't work. It would barely work for Americanized addresses, but even then not be ideal. Of course trying to internationalize it, this thing isn't even close.
I get what Copilot is trying to do. But at the same time I don't get it. Because from my experience, typing code is the fastest part of my job. I don't really have a problem typing. I spend most of my time thinking about the problem, how to solve it, and considering ramifications of my decisions before ever putting code in the IDE. So Copilot comes around and it autocompletes code for me. But I still have to read what it suggested, making edits to it, and consider if this is solving the problem appropriately. I'm still doing everything I used to do, except it saved me from typing out a block of code initially. I still have to most likely rebuild, edit, or change the function somewhat. So it just saves me from typing that first pass. Well that's the easy part of the job.
I have never had a manager come to me and ask why a project is taking so long where I could answer "it just takes so long to type out the code, i wish I had a copilot that could type it for me". That's why we call it software engineering and not coding. Coding is easy. Software engineering is hard. Github Copilot helps with coding, but doesn't help with Software Engineering.
A few years ago, I got a small but painful cut on my fingertip. I thought I would have a hard time on the job as a dev. To my surprise, I realized I spend 90-95% of my time thinking, and only 5-10% of the time typing. It turned out to be almost a non-issue.
So very true.
[1] Understanding the problem > [2] thinking about all possible solutions > [3] working out which solution fits best > [4] working out which implementations are possible > [5] working out the most suitable implementation
... and finally, [6] implementing via code.
So, rather than helping people program better, all its done is replace a bunch of the offshore cut-and-paste shops with "AI."
Like, I did it before, remember that it was trivial, I just forget the snippet and I have to break focus to look it up - often by scrolling through my own commit history to try and find the time I did [trivial thing Y] four months ago.
I do kind of wish I could automate that. Skipping the actual typing of the snippet is sort of gravy on top of that.
IME the best thing for this is looking at the method listing in the docs for the classes I'm using. E.g. for Ruby, it's usually looking at the methods in Enumerable, Enumerator, Array, or Hash. Or I'll drop a binding.pry into the function, run it, and then type ls to see what's in scope.
I'd be happy to hear about better demonstrations, and there's also Pry's website (https://pry.github.io/) where they link to some screencasts.
Still, having to look up the doc or run the code to figure out how to type it is orders of magnitude slower than proper auto complete (be it old school Visual Studio style, or something like Copilot).
Having worked extensively with verbose but autocomplete-able languages like Java, compact dynamic languages like Ruby, and a variety of others including C, Scala, and Kotlin, I've come to the conclusion that, for me, autocomplete is a crutch and I develop deeper understanding and greater capabilities when I go to the docs. IDE+Java encourages sprawl, which just further cements the need for an IDE. Vim+Ruby+FZF+ripgrep+REPL encourages me to design code that can be navigated without an IDE, which ultimately results in cleaner designs.
If there's any lag whatsoever in the autocomplete, it breaks my flow state as well. I can maintain flow better when typing out code than when it just pops into being after some hundreds of milliseconds delay. Plus, there's always the chance for serendipity when reading docs. The docs were written by the language creators for a reason. Every dev should be visiting them often.
But if it works for you, more power to you!
What I meant is that you will coincidentally learn new things by going to the docs for old/simple things. In addition to remembering that method ordering, you might learn about a new method that simplifies your task.
Im absolutely with you and want to upvote that part of the comment x100. Unfortunately it's often considered a fairly spicy opinions.
Entire frameworks (Rails) are built around the idea of typing as little as possible. Others can't even be mentioned without the topic of boilerplate/keystroke count causing a flame war (Redux).
A lot of engineers equate their value with the amount of lines they can pump out, so there's definitely a demand for tools like these.
There's also some legitimate stuff. There's a lot of very silly thing I have to google every time I do because I have a bad memory. It saves the step of googling. In a way, it was the same debate around autocomplete at the very beginning, but pushed to the next level. Autocomplete turned out to be a very good thing (even though new languages and tools keep coming out without it).
Vendors have lost sales to me because they were too incompetent to allow me to ship things to my actual address. Oops.
P.S. for the US, you need to offer at least two lines for the address part. And you need to accept really weird things that don’t seem to parse at all. I know people with addresses that have a PO Box number and a PMB number in the same address. Lose one and your mail gets lost.
P.P.S. If you offer discounted shipping using something like SurePost, make sure you let your customers pay a bit extra to use a real carrier. There are addresses that are USPS-only and there are addresses that work for all carriers except USPS (and SurePost, etc). Let your customer tell you how to ship to them. Do not second-guess your customer.
412 1/2 E E NE
412 1/2 A E
1E MAIN
1 E MAIN
FOO & BAR
123 ADAM WEST RD
123 ADAM WEST
123 EAST WEST
If you don't want to validate, then yes addresses are just a series of text fields. However, mapping them to that delivery point is where the problems arise.
fetch_tweets_from_user(user_name):
...
tweets = api.user_timeline(screen_name=user, count=200, include_rts=False)
'user' isn't defined, should be user_name, right? Side note, 'copilot' is a decent name for this (though copilots are usually very competent, moreso than this right now). You must check the suggestions carefully. Maybe it'll make folks better at code review, lol.- The JavaScript one (memoization) is a bad implementation, it doesn't handle some argument types you'd expect it to handle: https://news.ycombinator.com/item?id=27698125
You can tell a lot about what to expect, if there are so many bugs in the very examples used to market this product.
Nope. Game over. Play again?
Jokes aside, to have a proper junior dev replacement you need something that is able to learn and grow to eventually become a senior dev, an architect, or a CTO. That is the most important value of a junior dev. Not the ability to produce subpar code.
I think a lot of modern software development shops, these days, exist only to make their founder[s] as rich as possible, as quickly as possible.
If they are willing to commit their entire future to a lowest-bid outsourcing shop, then I don’t think they are too concerned about playing the long game.
Also, the software development industry, as an aggregate, has established a pervasive culture, based around developers staying at companies for 18-month stints. I don’t think many companies feel it’s to their advantage to incubate people who will bail out, as soon as they feel they have greener pastures, elsewhere.
From the Copilot FAQ:
Who owns the code GitHub Copilot helps me write?
GitHub Copilot is a tool, like a compiler or a pen.
The suggestions GitHub Copilot generates, and the code you write with its help, belong to you, and you are responsible for it.
We recommend that you carefully test, review, and vet the code, as you would with any code you write yourself.
Copilot can probably recite most of Quake's source code and according to the FAQ, the output of Copilot belongs to the user.I think a point where this argumentation might fail is that Quake's source code does not belong to Github directly, but instead both Github and Quake belong to Microsoft. However, I am not a lawyer, so I might be wrong.
> The technical preview includes filters to block offensive words
And somehow their filters missed f*k? That doesn’t give a lot of confidence in their ability filter more nuanced text. Or maybe it only filters truly terrible offensive words like “master”.
Attempting to generate text from code containing "genocide" just has Copilot refuse to run. But you can still coerce Copilot to return offensive output given certain innocuous prompts.
eyeroll
The big difference in this case, however, is that this AI was constantly learning based on user input, however, which I do not think is the case for Copilot.
[1] https://docs.github.com/en/github/copilot/about-github-copilot-telemetry
A similar thing is happening in AI Dungeon, where certain words and phrases are banned to the point of suspending a users account if used a certain amount of times, yet they will happily output them when it is generated by GPT3 itself, and then punish the user if they fail to remove the offending pieces of text before continuing.
It only goes to show that intransparent black-box models have no place in the industry. The networks leak information left and right, because it's way too easy to just crawl the web and throw terabytes of unfiltered data at the training process.
And illegal, if the original information remains.
I assume that there must be a process for altering the training data set and rerunning the entire thing.
Say, you have a model that repeats certain PII when prompted in a way that I figure out. I show you the prompt, you retrain the model to give a different, non-offensive answer. But now I go and alter the prompt and the same PII reappears. What now?
The strategic goal of a GPDR erasure request would be to force GitHub to nuke this thing from orbit.
Designing a system that you cannot control does not grant you legal immunity for whatever the system does. As Github operates inside the EU, personal information this system contains MUST be deleteable, correctable and retrievable, or it's simply illegal.
To me it doesn't show that copilot will regurgitate existing code when I don't want it to, just that if I ask it to copy some famous existing code for me it will oblige.
I got it to produce more GPL code too, that one is just not entertaining.
Do you have more examples like this that I can share with those who don't use Twitter, like a repo or blog post?
Many arguments on the benefits, legality and power of AI systems rely on this claim.
To turn around now and say it's OK to regurgitate in the right setting is to move the goalposts.
Do the Copilot authors claim this?
I get that you're suggesting that Copilot may benefit from absolute claims made by the authors of other, similar systems (or their proponents), but I also don't think it's reasonable to exclude nuance and the specifics of Copilot from ongoing discussions on that basis. The Copilot authors have publicly acknowledged the regurgitation problem, and by their account are working on solutions to it (e.g. attribution at suggestion-generation time) that don't involve sweeping it under the rug.
>GitHub Copilot is a code synthesizer, not a search engine: the vast majority of the code that it suggests is uniquely generated and has never been seen before. We found that about 0.1% of the time, the suggestion may contain some snippets that are verbatim from the training set.
It stands to reason that cases where people are intentionally trying to produce regurgitation will strongly overlap with the minority of cases where it actually happens. So I think we are probably suffering from some selection bias in discussions on HN and similar forums--that might be unavoidable, and it certainly stimulates some interesting discussion, but we should try to avoid misrepresenting the product as a whole and/or what its creators have said about it.
How would that work, anyway? Rare, distinctive code forms seem much more difficult for an ML thing to suggest with a high-ish confidence level, since there won't be much training data. The Quake thing makes sense because it's one of the most famous sections of code in the world, and probably exists in thousands of places in the public Github corpus.
I'm emphasizing distinctive because a lot of boilerplate takes up a lot of room, but still doesn't make a reasonable argument for copyright infringement when yours looks like somebody else's.
So what are you suggesting here, except that Github is attempting a legal sleight-of-hand to hide real infringement?
> while making customers believe that code is more or less synthesized in realtime.
What are you suggesting here except that Github is (essentially) lying to customers, making them believe something that is substantially untrue?
When I say "building an engine to cynically exploit the IP rights of open source copyright holders for profit", I am talking about a scenario in which they are sweeping legitimate IP concerns under the rug with bad faith legal weaselry and misrepresentation of how the product functions, etc., to chase profit. I do not see how that is substantially different from the implications of your comment, especially in the context of this subthread.
Could you enlighten me as to how your intended meaning substantially differs from my interpretation? If you don't mean to accuse Github of malfeasance, we probably don't have much to discuss.
I have done this to myself many, many times. I look, carefully, at length, for problems with my work until I satisfy myself that there's no obvious problem with it. Then someone else points out the obvious problem I was overlooking. Actually, this has happened often enough and it's painful enough that I learned to really look nowadays.
In the context of the search for "snippets that are verbatim from the training set" there's all sorts of things that can go wrong. The search (a regex search I think?) can be unintentionally made too weak to catch obvious cases. Or too strong, probably. The search for "snippets that are verbatim from the training set" may ignore snippets that are 80% verbatim. Or the code generated during the experiment can be generated in such a way as to only generate verbatim snippets very rarely, contrary to more typical use. And so on and so forth.
There's so many ways to fool oneself when looking for errors in one's own work.
Edit: They explain their search methodology and with only a quick look I gave it, it seems legit, but it was a quick look. The devil is in the details, yes? Maybe people who are really interested in this issue should take a closer look.
> It shouldn't do that, and we are taking steps to avoid reciting training data in the output
He's being woefully naive. To put it bluntly, we don't know how to build a neural network that isn't capable of spitting out training data. The techniques he pointed to in other threads are academic experiments, and nobody seems to have a credible explanation for why we should believe that they work.
I'm not anything close to an ML expert, and I have no opinion on whether what they're aiming for is possible, but this document^[1] (linked in your linked comment) states explicitly that they are aware of the recitation issue and are taking steps to mitigate it. So, in the context of the comment I replied to, I think Github is very far from claiming that recitation is "simply not possible".
^[1] https://docs.github.com/en/github/copilot/research-recitation
It's like if some corporate PR department told you "we're aware of the halting problem, and are taking steps to mitigate it." You would rightly laugh them out of the room.
It's not going to work, and the people making these statements either don't understand how much they don't understand, or are deluding themselves, or are actively lying to us.
An honest answer would be something like "We are aware that this is a problem, and solving it is an active area of research for us, and for the machine learning community at large. While we believe that we will eventually be able to mitigate the problem to an acceptable degree, it is not yet known whether this category of problem can be fully solved."
Again, I'm not an ML expert, but that sounds a lot more reasonable to me than announcing one's intention to solve the halting problem.
What's less clear to me is that Copilot regularly does that sort of thing with code distinctive enough that it could reasonably be said to constitute copyright infringement. If somebody's actually shown that it does, I'd love to see that analysis.
I use the halting problem as an analogy because their naive attempts to address this problem feel a lot like naive attempts to get around the halting problem ("just do a quick search for anything that looks like a loop," "just have a big list of valid programs," etc.). I can perform a similar analysis of programs that I run in my terminal and come to a similar "Hey look, most of them halt! Yay!" conclusion. I can spin a story about how most of the ones that don't halt are doing so intentionally because they're daemons.
But this approach is inherently flawed. I can use a fuzz tester to come up with an infinite number of inputs that cause something as simple as 'ls' to run forever.
Similarly, I can come up with an infinite number of adversarial inputs that attempt to make Copilot spit out training data. Some of them will work. Some of them will produce something that's close enough to training data to be a concern, but that their "attribution search" will fail to catch. That's the "open research question" that they need to solve.
We don't have a general solution to this problem yet, and we may never have one. They're trying to pass off a hand-wavey "we can implement some rules and it won't be a problem most of the time" solution as adequate. I don't see any reason to believe that it will be adequate. Every attempt I've seen at using logic to try and coax a machine learning model into not behaving pathologically around edge cases has fallen flat on its face.
> The three sentences about an attribution search at the very end are aspirational at best, and are presented as "obvious" even though it's not at all clear that such a fuzzy search can be implemented reliably.
I agree with all of this, though I do think that the attribution strategy they describe sounds a lot easier than solving the halting problem or entirely eliminating recitation in their model. Obviously, the proof will be in the pudding.
Maybe you and others are reacting to them framing this as "research", as if they're trying to prove some fundamental property of their model rather than simply harden it against legally questionable behavior in a more practical sense. I think a statistical analysis is fine for the latter, assuming the sample is large enough.
This issue goes way beyond just code - imagine GPT-like systems being used in medical diagnosis and results can suddenly depend on the date of the CT-scan or the name of patient, because the black-box simply regurgitates training data...
Copilot is NOT a shift up the abstraction tree. Over the last few years, though, I've realized the the concept of typing is. Typed programming is becoming more popular and prominent beyond just traditional "typed" languages -- see TypeScript in JS land, Sorbet in Ruby, type hinting in Python, etc. This is where I can see the future of programming being realized. An expressive type system lets you encode valid data and even valid logic so that the "building blocks" of your program are now bigger and more abstract and reliable. Declarative "parse don't validate"[1] is where we're eventually headed, IMO.
An AI that can help us to both _create_ new, useful types, and then help us _choose_ the best type, would be super helpful. I believe that's beyond the current abilities of AI, but can imagine that in the future. And that would be amazing, as it would then truly be moving us up the abstraction tree in the same way that, for instance, garbage collection has done.
[1] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/
This is something I'm interested in regarding this approach... When it works as intended, it's basically shortening the loop in the dev's brain from idea to code-on-screen without adding an abstraction layer that someone has to understand in the future to interpret the code. The result is lower density, so it might take longer to read... Except what we know about linguistics suggests there's a balance between density and redundancy for interpreting information (i.e. the bottleneck may not be consuming characters, but fitting the consumed data into a usable mental model).
I think the jury's out on whether something like this or the approach of dozens of DSLs and problem-domain-shifting abstractions will ultimately result in either more robust or more quickly-written code.
But on the topic of types, I'm right there with you, and I think a copilot for a dense type forest (i.e. something that sees you writing a {name: string; address: string} struct and says "Do you want to use MailerInfo here?") would be pretty snazzy.
WTF is "Copilot"?
Like Money Laundering for cash equivalents, or Trust-Washing for disinformation, but for hijacking software IP.
It might not be the intended use case, but that winds up being the practical result.
(on a related note, it would make me want to fun GPT-* output through plagiarism filters, but maybe they already do that before outputting?)
I mean, it goes right away with the devaluing narrative of programming that is going around from the last couple of years. To the "anyone can code" narrative we are adding "more so, if they have AI assisted Copilot"
"So you are an SWE and you take a break from work to go to Hackernews to complain that Github's Copilot, which is an AI-based solution meant to help SWEs, is utter shit and completely unusuable.
And then you go back to writing AI-based solutions for some other profession. Which is totally not shit or anything.“
Can anybody put this more elegantly?
"Briefly stated, the Gell-Mann Amnesia effect is as follows. You open the newspaper to an article on some subject you know well. In Murray's case, physics. In mine, show business. You read the article and see the journalist has absolutely no understanding of either the facts or the issues. Often, the article is so wrong it actually presents the story backward—reversing cause and effect. I call these the "wet streets cause rain" stories. Paper's full of them. In any case, you read with exasperation or amusement the multiple errors in a story, and then turn the page to national or international affairs, and read as if the rest of the newspaper was somehow more accurate about Palestine than the baloney you just read. You turn the page, and forget what you know."
https://www.goodreads.com/quotes/65213-briefly-stated-the-gell-mann-amnesia-effect-is-as-follows-you
Ultimately it's a kind of Kafkaesque trap that modern living has us all in to a larger or lesser extent.
The GP comment by contrast is about hypocrisy. I personally found it funny that I didn't ever read about (or consider) copyright violations of deep learning until they tried to do it with code :-)
Of course programmers would find the problem with AI as soon as it exploited them.
The same certainly apply to other tasks.
I don't know what you're talking about, I'm a webshit developer.
Also, Copilot might (or might not) be useless or even interfere with real work. But it's probably low on the scale of awful things SWEs have helped create. The AI parole app is a thing that should haunt the nightmare of whoever created it, for example. But lots of AI apps may be useless but are probably also harmless so doing that might not be worst thing.
... and the example is mathematically- and floating-point-spec obtuse enough that it was incomprehensible at the time it was written. (As evidenced by id comments)
Context-sensitive data retrieval is undoubtedly a part of it, though and the question is how big and relevant is that part and what are the consequences?
To me the biggest issue is that it's impossible to tell whether the suggestions are verbatim reproductions of training material and thus problematic.
It goes to show that this tool and basically every tool relying on the same or similar technology must now be assumed to do this and thus any code suggestion must be regarded plagiarism until proven otherwise. As a consequence such tools are now off-limits for commercial or open source development...
We need legislation banning companies from ingesting data into AI training sets without explicit permission.
First the automation came for the farmers, and I did not speak out —
Because I was not a farmer.
Then the automation came for the factory workers, and I did not speak out —
Because I was not a factory worker.
Then the automation came for the accountants, and I did not speak out —
Because I was not a accountant.
Then the automation came for me (a programmer) —
and there was no one left to speak for me.
---I could not figure out how to show it larger on the twitter UI. I don't have a twitter account so that may be the problem.
Or should we call it the Tesla of software?
(However, I'm still definitely going to try this out once I get off the waitlist.)
It should be a tool capable of one-shot learning.
I.e., I'm in the middle of a refactoring operation and have to do lots of repetitive work; the tool should help me by understanding what I'm trying to do after I give it 1 example.
Unfortunately for GitHub, there's no turning back the clocks. Even if they fix this, everyone that uses it has been put on notice that it copies code verbatim and enables copyright infringement.
Worse, there's no way to know if the segment it's writing for you is copyrighted... and no way for you to comply with license requirements.
Nice proof of concept... but who's going to touch this product now? It's a legal ticking time bomb.
I run product security for a large enterprise, and I've already gotten the ball rolling on prohibiting copilot for all the reasons above.
It's too big a risk. I'd be shocked if GitHub could remedy the negative impressions minted in the last day or so. Even with other compensating controls around open source management, this flies right under the radar with a c130's worth of adverse consequences.
yeah right. I wish.
(Not saying every dev does this)
But, I can't think of a single scenario where I've copied something from Stack Overflow. I'm searching for the idea of how to solve a problem, and typically the relevant code given is either too short to bother copying, or it's long and absolutely not consistent with how I want to write it.
Very honest suggestion: learn how to touch type. You can still copy if needed, but your typed input will be much faster.
Typing when you could paste is like having that Github Copilot put the right sentence right in front of you and you decide to type over it instead. Not only does it feel like wasted and robotic effort, typing everything leads to RSI.
I'm not sure why people disagree. Another symptom is that I insist on aliases for everything while others type out all the commands every time. Maybe I get distracted by the words when I type and lose my train of thought?
If nothing else this whole copilot thing is helping ease some chronic imposter syndrome
Of course it turned out the code I'd blindly inserted into my project contained a number of bugs. In one or two cases, quite serious ones. This, even though it was the accepted answer.
It was probably more effort to fix up the code I'd copy pasta'd than write it from scratch. Since then I've never copied and pasted from StackOverflow verbatim.
https://stackoverflow.com/help/licensing
It appears that the code that copilot is using is created under a huge variety of licenses, making it risky.
On the other hand, a small snippet in a function that is derived from many existing pieces of other code may fall under fair use, even if it is not under an open source license of some sort.
- "[I]f You Share Adapted Material You produce [..] The Adapter’s License You apply must be a Creative Commons license with the same License Elements, this version or later, or a BY-SA Compatible License."
- "Adapted Material means material [..] that is derived from or based upon the Licensed Material" (emphasis added)
- "Adapter's License means the license You apply to Your Copyright and Similar Rights in Your contributions to Adapted Material in accordance with the terms and conditions of this Public License.'
- "You may not offer or impose any additional or different terms or conditions on, or apply any Effective Technological Measures to, Adapted Material that restrict exercise of the rights granted under the Adapter's License You apply."
A program that includes a code snippet is unquestionably a derived work in most cases. That means that if you include a Stack Overflow code snippet in your program, and fair use does not apply, then you have to license the entire program under the CC-BY-SA. Alternately, you can license it under the GPLv3, because the license has a specific exemption allowing you to relicense under the GPLv3.
For open source software under permissive licenses, it may actually be okay to consider the entire program as licensed under the CC-BY-SA, since permissive licenses are typically interpreted as allowing derived works to be licensed under different licenses; that's how GPL compatibility works. But you'd have to be careful you don't distribute the software in a way that applies any Effective Technological Measures, aka DRM. Such as via app stores, which often include DRM with no way for the app author to turn it off. (It may actually be better to relicense to the GPL, which 'only' prohibits adding additional terms and conditions, not the mere use of DRM. But people have claimed that the GPL also forbids app store distribution because the app store's terms and conditions count as additional restrictions.)
For proprietary software where you do typically want to impose "different terms or conditions", this is a dead end.
Note that copying extremely short snippets, or snippets which are essentially the only way to accomplish a task, may be considered fair use. But be careful; in Oracle v. Google, Google's accidental copying of 9 lines of utterly trivial code [2] was found to be neither fair use nor "de minimis", and thus infringing.
Going back to Stack Overflow, these kinds of surprising results are why Creative Commons itself does not recommend using its licenses for code. But Stack Overflow does so anyway. Good thing nobody ever enforces the license!
[1] https://creativecommons.org/licenses/by-sa/4.0/legalcode
[2] https://majadhondt.wordpress.com/2012/05/16/googles-9-lines/
What makes it even worse is if you try to do the right thing by crediting SO (the BY part) you’re putting a red flag in the code that you should have known you have to share your code (the SA part).
They tried to relicense code snippets to MIT a while back, it was a big mess.
They are 100% okay with letting their competitors get into legal hot water.
* Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
* ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
In over a decade of software engineering, I've seen many reuses of Stack Overflow content, occasionally with links to underlying answers. All Stack Overflow content use I've seen would clearly fail the legal terms set out by the license.
I suspect Copilot usage will similarly fail a stringent interpretation of underlying licenses, and will similarly face essentially no enforcement.
One can now trivially coerce copilot to regurgitate copyrighted content without attribution. Copilot's basic premise violates the CC-BY-SA terms, and this will continue until no party can demonstrate a viable method of extracting copyrighted code.
There is now a single party backed by a company with a 2 Trillion dollar market cap that can be sued for flagrant copyright violations.
* Companies with engineers using Copilot. Risk here is negligible, like that of copying Stack Overflow answers, or any code that isn't under a truly permissive license like CC0 [1]. Prohibiting use of Copilot in a company based on this risk has no merit.
* GitHub and Microsoft. Risk for them is higher yet worthwhile. Copilot is more like Stack Overflow than Napster. Affected copyright holders added their works to GitHub and agreed to their terms, so GitHub has a legal basis to show that content in Copilot. In terms of facilitating copyright infringement, far more violations occur by engineers manually searching and copying code on GitHub; lawsuits against GitHub due to that would be dismissed. Determining provenance is slightly harder in Copilot than in search, but GitHub could minimize risk to itself by noting in Copilot terms that users must review Copilot's suggestions for underlying license concerns. Engineers rarely will -- they routinely violate licenses of Stack Overflow and code copied from elsewhere -- but that shifts responsibility from GitHub, and legal risk to companies using Copilot remains negligible.
[1] https://creativecommons.org/share-your-work/public-domain/cc0/
SO definitely comes up during copyright/IP training.
The basic idea is 'reading SO answers to learn how to solve a problem is fine, copying/transcribing the code is not'.
Google is quite paranoid re. copyright and licenses.
https://opensource.google/docs/thirdparty/licenses/#restricted
Id be very surprised if the other large enterprises that I have worked at downs doing exactly the same thing. Too much legal risk, for practically no benefit.
I guess my point is, you can't be positive that even if you're following the license in a repo you forked that the repo owner hasn't already violated someone else's license, and now transitively, so have you.
In fact, that seems to be exactly the problem shown in the tweet - someone copy-pasted the quake source and slapped a different license on it, and copilot blindly trusted the new license.
On the other hand, if only permissive licenses that also don't require attribution is used, well, then for a start, the available corpus is much smaller.
How much of a concern this is depends heavily on what the original source was.
"You guys aren't using any free software are you? Because you can't do that."
"You mean copying software source code without respecting the license, right? Because we absolutely respect all licenses fully."
"No I mean you can't use Free software! It's a clear management directive! What are you doing?!?"
"Is that an apple laptop you're using there? Ever had a look at the software licenses for it?"
Legal are generally idiotically ignorant about the real issues. Whose fault that is we can argue about.
It's absolutely the case that before using certain libraries, most engineers in large corporations will make sure they are allowed to use that library. And if they don't, they are doing their job very badly IMO.
Surely somebody working on this project foresaw this problem…
Code not written does not have defects, does not need support and, as you point it out, is not a liability.
If you think about language in general, individual words aren't very sensitive. The word for bomb in any language is public knowledge. But when you start getting to jargony phrases, some might be unique to an organization. And if you're training your MT on translated documents surreptitiously intercepted from West Nordistan's nuclear program, and make your MT model public, the West Nordistanis might notice - "hey, this accurately translates our non-public documents that contain rather novel phrases ... I think someone's been listening to us!"
The head example they show is using a sentiment analysis API which is about the most useless use of technology there is.
The best person to have on your team is a productive, high-quality coder.
The worst is a productive, low-quality coder.
Copilot looks like it would give us more of the latter.
"Built on Stolen Data": https://rugpullindex.com/blog#BuiltonStolenData
Here's the very fun video if anyone wants to take a look:
Kinda seems like maybe there's some level of insecurity at play here in the criticism. Like a "I coulda came up with that but its a bad idea" type of hater philosophy.
I'd be much more excited by (and less unnerved by) a tool which brought program synthesis into our IDEs, with at least a partial description of intended behavior, especially if searching within larger program spaces could be improved with ML. E.g. here's an academic tool from last year which I would love to see productionized. https://www.youtube.com/watch?v=QF9KtSwtiQQ
I don't think it is clear that such "fundamental weaknesses" exist. A text-based approach can get you incredibly far.
If you think about it, program synthesis is one of the few problems in which the system can have a perfectly faithful model dynamics of the problem domain. It can run any candidate it generates. It can examine the program graph. It can look at what parts of the environment were changed. To leave all that on the table in favor of blurting out text that seems to go with other text is like the toddler who knows that "five" comes after "four", but who cannot yet point to the pile of four candies. You gotta know the referents, not just the symbols. No one wants a half-broken Chinese Room.
Agreed - it represents a failure to adequately model/understand the task, but I don't think it is a "fundamental weakness" of text-based 'Chinese room' approaches.
> You gotta know the referents, not just the symbols. No one wants a half-broken Chinese Room.
"Knowing the referents" is not at all clearly defined. It's totally possible that, under the constraint of optimizing for next-word prediction, the model could develop an understanding of what the referents are.
You can't underestimate the level of complex behavior emerging from a big enough system under optimization. After all, all the crazy stuff we do - coding, art, etc. is produced by a system under evolutionary optimization pressure to make more of itself.
Well, in this case, it would have been good to understand that "V. Petkov" is a person unrelated to the project being written, and that "2015" is a year and not the one we're currently in. Sometimes the referent will be a method defined in an external library, which perhaps has a signature, and constraints about inputs, or properties which apply to return values.
> You can't underestimate the level of complex behavior emerging from a big enough system under optimization. After all, all the crazy stuff we do - coding, art, etc. is produced by a system under evolutionary optimization pressure to make more of itself.
I think this can verge into a kind of magical thinking. Yes, humans also look like neural nets, and we might even be optimizing for something. But we learn to program (and we do our best job programming) by having a goal for program behavior, and we use interactive access to try to run something, get an error, set a break point, try again, etc. I challenge anyone to try to learn to "code" by never being given any specific tasks, never interacting with docs about the language, an interpreter, a compiler, etc, but merely to try to fill in the blank in paper code snippets. You might learn to fill in some blanks. I highly doubt you would learn to code.
This is totally a case where the textual representation of programs is easier to get and train against, and that tail is being allowed to wag the dog to frame both the problem and the product.
None of this is to say that high-bandwidth DNN approaches don't have a place here -- but I think we should be looking at language-specific models where the DNN receives information about context (including some partial description of behavior) and outputs of the DNN are something like the weights in a PCFG that is used in the program search.
Mnyeah, not really that "incredibly". Remember that neural network models are great at interpolation but crap at extrapolation. So as long as you accept that the code generated by Copilot stays inside a well-defined area of the program space, and that no true novelty can be generated that way, then yes, you can get a lot out of it and I think, once the wrinkles are ironed out, Copilot might be a useful, everyday tool in every IDE (or not; we'll have to wait and see).
But if you need to write a new program that doesn't look like anything anyone else has written before, then Copilot will be your passenger.
How often do you need to do that? I don't know. But there's still many open problems in programming that lots of people would really love to be able to solve. Many of them have to do with efficiency and you can't expect Copilot to know anything about efficiency.
For example, suppose we didn't know of a better sorting algorithm than bubblesort. Copilot would not generate mergesort. Even given examples of divide-and-conquer algorithms, it wouldn't be able to extrapolate to a divide-and-conquer algorithm that gives the same inputs for the same outputs as bubblesort. It wouldn't be able to, because it's trained to reproduce code from examples of code, not to generate programs from examples of their inputs and outputs. Copilot doesn't know anything about programs and their inputs and outputs. It is a language model, not a magickal pink fairy of wish fulfillment and so it doesn't know anything about things it wasn't trained on.
Again, how often do you need to write truly novel code? In the context of professional software development, I think not that often. So if it turns out to be a good boilerplate generator, Copilot can go a long way. As long as you don't ask it to generate something else than boilerplate.
There are approaches that work very well in the task of generating programs that they've never seen before from examples of their inputs and outputs, and that don't need to be trained on billions of examples. True one-shot learning of programs (without a model pre-trained on billions of examples) is possible. With current approaches. But those approaches only work for languages like Prolog and Haskell, so don't expect to see those approaches helping you write code in your IDE anytime soon.
This solely text based approach is simply “easy” to do, and that’s why we see it. I think it’s cool and results are intriguing but the approach is fundamentally weak and IMO breakthroughs are needed to truly solve the problem of program synthesis.
Parsing intent in a programing context is easier than others. Also most the code is written to be parsed for a machine anyway. So with ASTs and all other static and maybe even some dynmaic checks it should be possible.
We already some of it with type detection , intellisense etc
It is hard set of problem with no magic solutions like this with years of development time needed. That approach will not happen commercially, only incrementally in the community.
You need either a) a complete specification of the target program in a formal language (other than the target language) or b) an incomplete specification in the form of positive and negative examples of the inputs and outputs of the target program, and maybe some form of extra inductive bias to direct the search for a correct program [edit: the latter setting is more often known as program induction].
In the last few years the biggest, splashiest result in program synthesis was the work behind FlashFill, from Gulwani et al: one-shot program learning, and that's one shot, from a single example, not with a model pretrained on millions of examples. It works with lots of hand-crafted DSLs that try to capture the most common use-cases, a kind of programming common sense that, e.g. tells the synthesiser that if the input is "Mr. John Smith" and the output is "Mr" then if the input is "Ms Jane Brown" the output should be "Ms". It works really, really well but you didn't hear about it because it's not deep learning and so it's not as overhyped.
Copilot tries to circumvent the need for "programming common sense" by combining the spectacular ability of neural nets to interpolate between their training data with billions of examples of code snippets, in order to overcome their also spectacular inability to extrapolate. Can language models learned with neural nets replace the work of hand-crafting DSLs with the work of collecting and labelling petabytes of data? We'll have to wait and see. There are also many approaches that don't rely on hand-crafted DSLs, and also work really, really well (true one-shot learning of recursive programs without an example of the base case and the synthesis terminates) but those generally only work for uncommon programming languages like Prolog or Haskell, so they're not finding their way to your IDE, or your spreadsheet app, any time soon.
But, no, AGI is not needed for program synthesis. What's really needed I think is more visibility of program synthesis research so programmers like yourself don't think it's such an insurmountable problem that it can only be solved by magickal AGI.
I am not belittling the work going in this space, and I’m sure for highly constrained and narrow use cases a lot can be done even now. But I believe solving the general problem of program synthesis based on informal spec requires AGI. I am hardly the only one who thinks this.
No. Program synthesis approaches work very well for a broad array of problems, not for "highly constrained and narrow use cases"- that is a misconception of the kind that results from lack of familiarity with modern program synthesis.
Here's a good recent review of the field:
https://www.microsoft.com/en-us/research/wp-content/uploads/2017/10/program_synthesis_now.pdf
Sumit Gulwani, that I mentioned in my previous comment, is an author. To clarify, I'm not in any way affiliated with him or his collaborators. I'm actually from a rival camp, if you will, but the paper I link to is a very good summary of the state of the art. It should help you if you wish to understand where program synthesis is at.
>> I said program synthesis good enough to replace programmers requires AGI. Program synthesis based off of informal specifications in natural language.
Program synthesis from natural language is hard to make work because it's difficult to translate natural language specifications to specifications that a program synthesiser can use. But that is a limitation of current natural language analysis, specifically natural language understanding, approaches - not a limitation of program syhtesis approaches.
I think you equate formal specifications, or specification by example, with "narrow use cases". There's no connection between the two.
The reality seems to disagree with your statements. Program synthesis is as of right now limited to academic research and highly narrow use cases. If the opposite was true, I’d be out of a job.
I think copilot is probably the first product of its type that might make its way into the hands of users en masse.
Edit:
Btw I was referring to program synthesis based off informal natural language spec. Spec inference is part of the synthesis pipeline, I think it’s not fair to just ignore that problem.
Anyway the review I linked to has some examples of real-world applications of program synthesis. Don't be afraid to read it- it's light on formal notation and you don't need special skills to understand it. I appreciate that it's a long document but there's a Table of Contents at the start and you should be able to skim through in a short time just to get a general idea of the subject.
Anyway I can see you're trying to "wing it" and reason from first principles about something you know nothing about, in true SWE style. Yet, you don't know what you don't know, so you start from the wrong assumptions ("fully automated" etc) and arrive at the wrong conclusions. That's no way to understand anything. It's certainly not going to give you any good idea about what's going on in an entire field of research you know nothing about.
Of course you're not obliged to know anything about program synthesis, but in that case, maybe consider sitting back and listening rather than expressing strong opinions with absolute conviction that is not supported by your knowledge? I think that will make a better conversation, and a better internet, for everyone.
Copilot suggests some code snippets, and not necessarily good ones. To be dismissive of another approach to generate parts of programs because they cannot replace programmers is like saying that belt-drive bikes aren't worth considering over chains because a belt-drive bike isn't a replacement for a Learjet.
I was merely saying that for the holy grail, program synthesis from informal spec generalised to any domain, the approach will have to be different.
Again I think you're trying to imagine what program synthesis must be like, rather than trying to find out what it actually is like.
You are being a little condescending btw, it’s not in the spirit of hacker news.
> The ability to automatically discover a program consistent with a given user intent (specification) is the holy grail of Computer Science.
https://drops.dagstuhl.de/opus/volltexte/2017/7128/pdf/LIPIcs-SNAPL-2017-16.pdf
> Since the inception of AI in the 1950s, this problem has been considered the holy grail of Computer Science.
https://www.microsoft.com/en-us/research/publication/program-synthesis/
https://dl.acm.org/doi/fullHtml/10.1145/242224.242304
> Automatic synthesis of programs has long been one of the holy grails of software engineering.
https://people.eecs.berkeley.edu/~sseshia/pubdir/icse10-TR.pdf
If so, please leave them alone. You have no reason to assume that program synthesis from a natural language specification is "the holy grail" of anything. But I'm glad that our conversation at least made you look up a few links, even if only to try and win the internet conversation from what it looks like.
So what did you learn, from what you read about program synthesis? Can you see why your assumptions earlier on, about "narrow use cases" and the like were wrong?
Edit: btw, the Freuder paper you linked is about constraint prorgamming, not program synthesis.
And did you notice that one of your links above is the abstract of the review paper I proposed you read, earlier?
From my perspective, spec inference from informal spec is the main thing to solve. Because for formal specs, I’d just be programming in a declarative language to create the formal spec.
Spec by example won’t scale because you can’t provide examples across the entire domain for apps of real world complexity.
Once spec inference is solved, then you are just left with a search problem. I understand that the search space is freakin huge but I’d still say the latter problem is easier to solve than the former.
And I’d guess that the problem of inferring a spec from an informal description is what requires AGI.
I hope this clarifies my POV. I don’t think we disagree, we just have different perspectives.
Now. I think the root of our disagreement was the necessity or not of AGI for various kinds of program synthesis, which I think we 've probably pared down to program synthesis from an informal specification, particularly a natural language one.
I don't agree that AGI is necessary for that. I think that, as many AI tasks, such a complete and uncompromising solution can be avoided and a more problem-specific solution found instead.
In fact, we already had almost a full solution to the problem in 1968, with SHRDLU [1], a program that simulated a robotic hand directed by a human user in natural language to grasp and rearrange objects. This was in the strict confines of a "blocks world" but its capability, of interpreting its user's intent and translating it accurately to actions in its well-delineated domain remains unsurpassed [2]. Such a natural language interface could well be implemented for programming in bounded domains, for example to control machinery or run database queries etc. This capability remains largely unexploited, because the trends in AI have shifted and everybody is doing something else now. That's a wider-ranging conversation though. My main point is that a fully-intelligent machine is not necessary to communicate with a human in order to carry out useful tasks. A machine that can correctly interpret statements in a subset of natural language suffices. This is technology we could be using right now, only nobody has the, let's say, political will to develop it because it's seen as parochial, despite the fact that its capabilities are beyond the capabilities of modern systems.
As to the other kind of informal specifications, by examples, or by demonstration, what you say, that it won't scale etc, is not true. I mentioned earlier one-shot learning of recursive hypotheses without an example of the base case [3]. To clarify, the reason why this is an important capability is that recursive programs can be much more compact than non-recursive ones and still represent large sets of instances, even infinite sets of instances. In fact, for some program learning problems, only recursive solutions can be complete solutions (this is the case for arithmetic and grammar learning for example). The ability to learn such solutions from a single example I think should conclusively address concerns about scaling.
To be fair, this is a capability that was only recently achieved by Inductive Logic Programming (ILP) systems, a form of logic program synthesis (i.e. synthesis where the target language is a logic programming language, usually Prolog, but not necessarily). The Gulwani survey I linked mentions this recent advance in passing only. But ILP systems in general have excellent sample complexities and can routinely generalise robustly from a handful of examples (in the single digits), and have been doing this since the 1990's.
The cost of searching a large hypothesis space is, indeed, an issue. There are ways around it however. Here, I have to tout my on horn and mention that my doctoral research is exactly about logic program learning without search. I'd go as far as to say that what's really been keeping program synthesis back is the ubiquity of search-based approaches, and that program synthesis will be solved conclusively only when we can construct arbitrary programs without searching. But you can file that under "original research" (I mean, that's literally a description of my job). Anyway, no AGI is needed for all of this.
So, I actually understand your earlier skepticism, along the lines of "if all this is true, why haven't I heard of it?" (well, you said why do you still have a job but it's more or less the same thing). The answer remains: because it's not the trendy stuff. Trends drive both industry and research directions. Right now the trend is for deep learning, so you won't hear about different approaches to program synthesis, or even program synthesis in general. There's nothing to do about it. Personally, I just grin and bear it.
_________________
[1] https://en.wikipedia.org/wiki/SHRDLU
[2] For example, in the demonstration conversation quoted by wikipedia, note the "support supports support" sentence.
[3] This is already growing too long so I won't include examples, but let me know if you're curious for some.
To "replace programmers", an organization would need to have a way of specifying to the system a high level program behavior, and to confirm that an output from the system satisfies that high level behavior. I think for specifications of any complexity, producing and checking them would look like programming just of a different sort.
The reality is that we never attribute the original source because we copy-paste it, change it up a bit, and make it our own. Literally everybody does this.
I still care about licensing and proper attribution but the reality is that a snippet of code is not something so easy to attribute. Should we attribute all kinds of ideas, even the very small ones? How quickly is an idea copied, altered & reused? Can we attributes all the thoughts humans have?
He tried to write Quine in Ruby, ended up conjuring up a copyright claim comment and fake licensing term. https://twitter.com/mametter/status/1410459840309125121
(I'm no John Carmack, but still.)
I think I'm gonna give "AI" a few more years.
"'Sweary comments' is not grammatically correct English." -> GPT-2 ->
"Sweary comments" is not grammatically correct English. The "sweary" meaning is used as slang by those in charge, especially the ones who get a free ride through the media, for the sake of a quick buck, or the sake of getting a better job with a promotion. The word is commonly used in a social context and comes from an actual "sweary" comment. The statement or phrase, "You're too shy!" is derived from this phrase: "I wish he were as outgoing as you."
Am I right?I cannot believe GitHub would do this.