Release The Full Model! #16

superjayman · 2019-02-15T02:02:26Z

I understand your concerns but I still think it's better to release the full model now and let people poke at it's abilities and discover potential issues quicker.

WuTheFWasThat · 2019-02-15T02:35:20Z

Thanks for raising the issue! People have expressed similar sentiment internally and we take that argument seriously. Would love to see people start investigations with the small model and we will be re-evaluating our release of the larger models in the future.

WuTheFWasThat · 2019-02-15T02:36:04Z

Actually, it seems more correct to leave this issue open :)

yzho0907 · 2019-02-15T03:12:33Z

plz release the models that support more languages

gabefair · 2019-02-15T03:21:10Z

Better safe, then sorry. If the experts want caution, the least we can do is respect their judgement

Franck-Dernoncourt · 2019-02-15T05:00:52Z

https://blog.openai.com/better-language-models/:

We will further publicly discuss this [model release] strategy in six months. If you’d like to discuss large language models and their implications, please email us at: languagequestions@openai.com.

roschler · 2019-02-15T05:09:41Z

Will you be releasing the English speaking unicorns to the public?

WuTheFWasThat · 2019-02-15T05:23:09Z

We don't know enough about unicorns to say they aren't dangerous. We will release a unicorn fetus for the scientific community to study for now, and re-evaluate later.

superjayman · 2019-02-15T06:18:57Z

It's a pity, let me remind you the name 'OpenAI' well not so open is it?

yzho0907 · 2019-02-15T07:28:42Z

@superjayman i agreed, open and sharing is the core of open innovations and release all does not do any harm but improves it much more quicker.

bnealey · 2019-02-15T08:16:33Z

Thanks for exercising caution and pointing out that you did. Seems cool.

Curious about focus. Can haz enlightenburger?

marca-development · 2019-02-15T08:38:47Z

Can you train this on a list of translated sentences (from english to japanese for example) and use it as an AI language translator?

Tophness · 2019-02-15T09:48:07Z

Isn't this exactly what OpenAI was not supposed to be about? Being closed source and up to the whim of PR teams and private incentives of a small number of people?
I've made my own natural language generator that only says things that makes sense (and ask itself questions based on it's own answers) and it does the same thing.
I also got it to believe in a god and break down from anxiety over "should" questions in a really insightful way that would be helpful to alot of people. Turns out if you don't internally ask or answer "should" questions at all it's really hard to get into a social anxiety loop, and you can see it all break down like "what if they think [this] > what should I think about the human thinking [this] about me? > idk > I haven't talked to the human because I was thinking about this > what if they think [this] now is this good or bad? Should I care?"
Wouldn't have known that if I stopped working on it like you guys did.
It's like you're trying to answer the trolley problem like it's some kind of moral dilemma. Almost none of them are. It's an engineering problem. The velocity and mass of the trolley is not unknown, and there's 9 different ways you can stop the train using physics, but if you stand there thinking about whether you should pull it or not, you're fucked either way.

Even for fake news, this would be a good tool. If you're going to believe something just because it's possible to say it using the english language, you're a fucking idiot. Check the source. Check peer reviews.
If anything, a random blog post generating fake news like this will point out how stupid people are for believing it. And it's much easier to do that than it is with mainstream media.

fallenartist · 2019-02-15T10:15:50Z

Maybe, as in POI, they are still teaching their child to be kind?

chenyangh · 2019-02-15T10:51:14Z

I respect the decision, "with great power comes great responsibility". But I suggest the releasing of the 345M model. The reasons come in two folds: it is much better than the 117M but not nearly as good as the 1.5B model; it has a similar amount of parameters to the BERT-large-uncased model which makes it a good candidate to be compared with.

yzho0907 · 2019-02-15T11:02:30Z

Maybe the reason this git exists is the same that the team should release all and be 'open' but they are the ones who make the decision anyway. I just hope that it would be good for both of the team and us.

max-frai · 2019-02-15T11:03:07Z

There are a lot of things could be used in wrong direction in bad hands. But just imagine the positive feedback of your technology. Your fears are inevitably in anyway.

dackdel · 2019-02-15T12:32:39Z

https://news.ycombinator.com/item?id=19168712 please read the horrendous comments. well dont read all of them it gets depressing. but jesus christ. you are OPEN ai. do we really have to spell that out for you. O P E N

sciencemanx · 2019-02-15T17:07:22Z

Help I need this to help write my 9th grade essays

jensstark · 2019-02-15T19:19:14Z

Okay. A nonprofit writes an interesting product, which a for - profit could probably recreate and patent. Or am I wrong there?

Why release a teaser only? A shrunk, non-trainable thing, just there to show off?

I admit it. I am suitably impressed, but also seriously annoyed

"open" in name only.

lahwran · 2019-02-16T03:44:18Z

It's worth remembering that openai has in fact been pretty good about releasing the code to their stuff. They've been much more open than deepmind, which I think was the concern that lead to their creation. This seems comparable to responsible disclosure in software security - when an open source group finds a bug in widely-deployed, un-updateable software, eg something used in routers or etc, that could be used for large scale spamming, they'll start work on ways to mitigate it before announcing what the vulnerability is. If someone who works for a FOSS company were to find a really efficient design for building software to do denial-of-service attacks, it'd be a similar story - look for DOS mitigations first.

I'd say it's a comparable situation to the latter: OpenAI is worried that they've built a generally useful tool that could make a category of DOS attack much much worse, and they don't currently see anything preventing that from happening.

I've been thinking that it might be good to get something like GPT2 1.5B into the hands of google and facebook and a few other major forum operators, maybe reddit, under a contract to use it for improving moderation. (edit to clarify: just giving them early access so they can use it to build safeguards against things like it.) It seems like GPT2 is good enough to take a serious crack at implementing xkcd's suggestion from nearly ten years ago: who cares if it's a human or a machine? the real question is whether it's malicious content. That proposal as it is wouldn't help so much with fake news, because people lying is a different problem than people doing a denial-of-service via vitriol, but it would make a big impact on a major source of the problem. Or perhaps the AI teams with enough resources could get together and talk about how to use this level of NLP performance to build other types of linguistic DOS mitigations.

I am, for my own curiosity, quite irritated that it's not being released, but I agree that the performance is reasonably worthy of the concern. I just don't see not releasing it as being that useful unless the time until someone replicates it is spent making mitigations to the world that will be created when someone else has a copy.

@WuTheFWasThat I do think yall could probably release the training code safely, though. Seems to me that it's the dataset and something like $40k worth of compute of trained model that are the real interesting thing here.

4R7I5T · 2019-02-16T05:18:00Z

I think you guys are scared of nothing, release the whole model please.

It's not like you have 20,000 people who pulled this repo - so it's really hard to use this 'maliciously'

Besides there are other alternatives that have produced similar (better) results than what this is - cakechat for example when fed the Reddit corpus (the same one that spooked you) you'll get some crazy things. But just like when you tell a young kid 'its just a movie' or 'just a game' - this is just a computer program. It's not some sci fi novel come to life.

schwittlick · 2019-02-16T13:21:32Z

We want the red pill!

iurimatias · 2019-02-16T14:19:30Z

The resources needed to train the full model are beyond the average person and small companies which could use this for potentially very interesting non-malicious applications. However large organizations and state actors that are most likely to use this for malicious purposes can and typically do already have easy access to the resources needed to replicate the full model.

Therefore by not releasing the full model you are ensuring that this sort of AI tech remains in the hands of powerful organizations and state actors that are most likely to misuse it while at the same time unintentionally tricking the general public to think this tech is not "really" available yet. Releasing the full model & leveling the playing field is the right thing to do here. Please release the full model.

superjayman · 2019-02-16T14:24:48Z

So How Many Other Innovations Are You Guys Going To Keep Closed? Say next week you have an even bigger break-thru , will the full model now be superseded and seem less harmful and you may decide to release it?.. see it does not make sense, how do you put a limit on unknown capabilities?

gabefair · 2019-02-16T18:17:01Z

Everyone here could benefit from Nick Bostrom's The Unfinished Fable of the Sparrows as presented in his 2014 book about this subject, Superintelligence: Paths, Dangers, Strategies. Dr. Bostrom is Director of the Future of Humanity Institute at the University of Oxford.
https://youtu.be/7rRJ9Ep1Wzs

bhack · 2019-02-16T23:11:42Z

I think at least you can open soon a challenge like i.e. Google's fake audio detection challenge and then release the full model after the community has a detection baseline.

TechSupportGo · 2019-02-17T00:40:14Z

Well.. Here is what I predict will happen very soon and why. The thing your software can do will be replicated and released for the whole world within months, maybe even weeks. It will grow just like deep fakes and college students will be using it to write their finals in the fall. The media has blasted the fact that you have a new toy and you refuse to share. Now that people know what type of coverage they can expect for a full released version they will not care about consequences. They will get the publicity and the feedback they need to make it even better. From that point forward all the phone apps, diy personal assistance devices, and automated blog post generators will say powered by [insert company].. Yes that same company name will be associated with the fake Amazon reviews but when it comes to business and economics bad pablicity is still pablicity. The up side is that instead of "encouraging" the government and other agencies to address these issues they will be forced to. This train is coming and I am afraid that you putting pennies on the track is not going to stop it. Heck, my 15 year old uses python in ways it would never have occurred to me. Honestly, I personally couldn't pull this off without a team but I am sure there are investors out there that see dollar signs in being first. I am sure you have gotten some very interesting e-mails reinforcing that sentiment. If I were in your position I would reconcider my decision to release the full project or at least set a date. People tend to be more productive when they are up against the clock. 90 days would certainly be enough time for these big companies to prepare and more than enough time for governments to educate their patrons about the swarm of "fake news" headed their way. I read that last sentence and spit my drink out hahahaha.. Anyway, read my post in your meeting Monday morning and reevaluate your decision. Great job by the way. It must be awesome to see the results first hand.
//This post was written by a human.//

bladedsupernova · 2019-07-27T19:06:30Z

So basically, the adjacent dot in the image is a very close Next Word, to be used instead of other possible words?

195 seems to be ĉ...
What is c?

Like: I was eating a hamburger/nugget/apple/etc

MrKrzYch00 · 2019-07-27T19:06:57Z

Refer to encoder.json

bladedsupernova · 2019-07-27T19:07:30Z

What about encoder.json? What is in that?

MrKrzYch00 · 2019-07-27T19:09:05Z

encoder/decoder for GPT-2. Translates words (fully or partially) to tokens and after creating output back I think. However, I don't know about full relation to vocab.bpe.

bladedsupernova · 2019-07-27T19:10:51Z

vocab.bpe are parts of speech, BPE found them first, and these are encoding parts

MrKrzYch00 · 2019-07-27T19:12:14Z

    with open(os.path.join(models_dir, model_name, 'encoder.json'), 'r') as f:
        encoder = json.load(f)
    with open(os.path.join(models_dir, model_name, 'vocab.bpe'), 'r', encoding="utf-8") as f:
        bpe_data = f.read()

Both are used.

Anyway, here: https://www.tensorflow.org/guide/graph_viz

EDIT: OK, I did put the full path to metadata graphs. Here is the main result:

And I think this part consist of my 3 TOP_P ranges:

So with that it can be visualized what exactly happens in tensorflow and least.

pardhu002 · 2019-08-01T20:47:43Z

Is anyone tried asking gpt-2 model what happens when releasing such a potential resource to public?

Byte1122 · 2019-08-01T21:06:26Z

I see a lot of people are interested. We need two things;

Knowledge
Funds

1; as far I can see there is enough here
2; this we can do with cryptocurrency funding. We can do by forming a foundation of a truly open AI and release it. Maybe we can let it work with a blockchain approach?

What is the problem of releasing or mirroring the project? Funds? We can raise it. Knowledge? We have it.

FurkanGozukara · 2019-08-02T07:06:48Z

Is anyone tried asking gpt-2 model what happens when releasing such a potential resource to public?

nothing would happen

it is already available for those who have computation power (money)

bladedsupernova · 2019-08-02T09:38:47Z

Everyone is already have their own in ther nogin, called GPT-3000, last I checked. And can hire teams to write fake internet news. Perfect hit and no false moves.

bladedsupernova · 2019-08-02T22:08:05Z

My point was fake news isn't why openAI holds the algorithm back. We can fool humans way better, yet we are still here arn't we! I can write trash, too. I actually know why they didn't release, but am too kind to say it.

MrKrzYch00 · 2019-08-23T16:05:02Z

First impressions on 774M model. The 6GB GPU can generate the sample in the same amount of time, however. Amount of samples in a batch needs to be reduced from 5 to 2. So overall the time required to generate similar amount of samples would be 2x (in case of 4 - 2x2) or 3x (in case of 6 - 3x2).

The generated text seems to be better quality. I didn't test it enough yet but I can already tell there is a difference. Still can get some nonsense going on but to the lower extend and it seems to be more creative - I could select something of sense more often. The thing I'm curious is if it confuses people in the dialog as previous one was doing or how it handles completely weird stuff going on, which is to be tested (although I already started and it still mixes certain stuff there quite a bit).

And yes, like in the paper, I can feel the reading comprehension being better, it kind of stick to the story better, I think? However, there is this weird feeling for now that it persist on a person's explaining/repeating stuff a bit over and over but in creative kind of way unlike previous medium model (to be tested further).

glencoe2004 · 2019-09-30T01:20:55Z

Now we're never going to get it as the project has been archived, which is a shame :(

bladedsupernova · 2019-09-30T11:07:57Z

Where has it been stuffed away? Point me there. I can still comment and download the zip.

glencoe2004 · 2019-09-30T11:12:45Z

@bladedsupernova the readme:
"Status: Archive (code is provided as-is, no updates expected)"

So yeah, you'll be able to download the code but it's unlikely we'll ever see the full model, which is a shame.

bladedsupernova · 2019-09-30T11:18:23Z

Nividea made a 8B parameter model, 5 times morrrre powerful. I think github has it availble, maybe. it is the megatron language model by nividea

At least we can try the 774M.

FurkanGozukara · 2019-09-30T11:53:19Z

lets pressure on Nvidia to release the trained model for experimentation

NVIDIA/Megatron-LM#11

bladedsupernova · 2019-09-30T12:13:53Z

HAHAHAHAHAHAHAHAHAHAHAHAHAHAHAHA

anubi · 2019-10-07T17:03:54Z

If you wanted to build malware or whatever using GPT-2 you certainly could, the half model creates human readable text that 90% of the time is consistent enough to be believably written by the "average" person (considering the "average" person can't even get their vs there right...). If you had the money, or are a state actor, you can just hire people to write troll posts on the sosh me. Consider that with a public cloud provider charging $2.5 per GPU hour, running the full model to generate text in an efficient manner (on a cheapo VCPU it takes 2.5 minutes of CPU time to make a single paragrpah of text) is going to get expensive.

I can't wait for the full model, but looking back on the press release for this project, I think the decision to not release the full thing was purely a marketing move. The full version will be cool, but it won't be 'groundbreaking'. But releasing a "half as powerful version" is a great marketing tool. I got hyped for the full release, but in all honesty I think i twill feel like going from 90% comprehensive to 92.5% comprehensive. You're definitely going to see diminishing returns for tensorflow stuff.

This sets a bad precedent for an "open" project though. OpenAI isn't "half-OpenAI".

EvgenijM86 · 2019-11-05T07:31:26Z

I have no problem with you not releasing your model. But I do have a problem with you calling yourself Open AI, because it is misleading. If this practice continues you should consider renaming yourself to Closed AI.

bhack · 2019-11-05T17:38:34Z

Can we close this issue now?

AlphaGit · 2019-11-06T05:26:58Z

The full model has been released

bladedsupernova · 2019-11-06T05:44:59Z

THANK GOD. YOU GUYS WILL FINALLY SHUT UP.

lol

TechSupportGo · 2019-11-09T05:34:41Z

Well, this was a bit anticlimactic. After we got our stuff in the trunk, we walked down to the parking lot, where we got a quick ride back to the airport. So, at the very least we had a ride to get home, but I don't really think this was the right method of getting home.
I think this is the point where I have to get off this blog post, and I have to stop talking about this. I don't know how to handle this anymore. I don't have any answers and I'm not going to say I'm going to go out and take a walk. I guess I'll just sit in my office, and cry for a few minutes, until I pass out. I think I need to go back to sleep. There is something wrong with me. I'm not the same. My life is completely and totally turned around. Yours truly, gpt2.

MrKrzYch00 · 2019-11-09T15:45:59Z

The full model won't work with 6GB GPUs. Not enough memory, so don't buy such if you plan to use 1558M model. 774M will work with 6GB though.

IveJ · 2019-11-09T16:35:37Z

Thanks.

…

On Sat, Nov 9, 2019, 22:46 Krzysztof Bochniak ***@***.***> wrote: The full model won't work with 6GB GPUs. Not enough memory, so don't buy such if you plan to use 1558M model. 774M will work with 6GB though. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#16?email_source=notifications&email_token=AEYAML7W7BCH256ILAU4YNTQS3LMTA5CNFSM4GXTFIO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDUI4SI#issuecomment-552111689>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEYAMLYAH2NBJM77SFO5UR3QS3LMTANCNFSM4GXTFIOQ> .

aabeshkarmacharya · 2019-11-11T03:44:03Z

OOM error with 8 GB GPU too. 774M works with 8GB on some cases but gives OOM on some cases.

MrKrzYch00 · 2019-11-11T11:39:35Z

@ElvisJames Interesting. I didn't get OOM with 6GB GPU with 774M, however, I'm using 2nd GPU that reports 5750MB free and running 2 samples at once max (more will result in OOM, yes). Are you sure you cannot squeeze full model on 8GB GPU when it's not used by the system at all and running one sample? Or even limiting output size to 128 tokens?

bladedsupernova · 2020-01-01T21:39:50Z

Full AGI release!:
https://aidreams.co.uk/forum/index.php?topic=14490.0

WuTheFWasThat closed this as completed Feb 15, 2019

WuTheFWasThat reopened this Feb 15, 2019

nmstoker mentioned this issue Feb 15, 2019

Any plans to release WebText corpus? #24

Open

WuTheFWasThat mentioned this issue Aug 22, 2019

Why not the 1.5B model as well? Even 8.3B (instructions) have been posted by NVIDIA #169

Closed

Release The Full Model! #16

Release The Full Model! #16

Comments

superjayman commented Feb 15, 2019

WuTheFWasThat commented Feb 15, 2019

WuTheFWasThat commented Feb 15, 2019

yzho0907 commented Feb 15, 2019

gabefair commented Feb 15, 2019

Franck-Dernoncourt commented Feb 15, 2019

roschler commented Feb 15, 2019

WuTheFWasThat commented Feb 15, 2019

superjayman commented Feb 15, 2019

yzho0907 commented Feb 15, 2019

bnealey commented Feb 15, 2019

marca-development commented Feb 15, 2019

Tophness commented Feb 15, 2019 • edited

fallenartist commented Feb 15, 2019

chenyangh commented Feb 15, 2019 • edited

yzho0907 commented Feb 15, 2019

max-frai commented Feb 15, 2019

dackdel commented Feb 15, 2019

sciencemanx commented Feb 15, 2019

jensstark commented Feb 15, 2019

lahwran commented Feb 16, 2019 • edited

4R7I5T commented Feb 16, 2019

schwittlick commented Feb 16, 2019

iurimatias commented Feb 16, 2019

superjayman commented Feb 16, 2019 • edited

gabefair commented Feb 16, 2019 • edited

bhack commented Feb 16, 2019

TechSupportGo commented Feb 17, 2019

bladedsupernova commented Jul 27, 2019 • edited

MrKrzYch00 commented Jul 27, 2019

bladedsupernova commented Jul 27, 2019 • edited

MrKrzYch00 commented Jul 27, 2019

bladedsupernova commented Jul 27, 2019

MrKrzYch00 commented Jul 27, 2019 • edited

pardhu002 commented Aug 1, 2019

Byte1122 commented Aug 1, 2019

FurkanGozukara commented Aug 2, 2019

bladedsupernova commented Aug 2, 2019

bladedsupernova commented Aug 2, 2019 • edited

MrKrzYch00 commented Aug 23, 2019 • edited

glencoe2004 commented Sep 30, 2019

bladedsupernova commented Sep 30, 2019

glencoe2004 commented Sep 30, 2019 • edited

bladedsupernova commented Sep 30, 2019 • edited

FurkanGozukara commented Sep 30, 2019

bladedsupernova commented Sep 30, 2019

anubi commented Oct 7, 2019 • edited

EvgenijM86 commented Nov 5, 2019

bhack commented Nov 5, 2019

AlphaGit commented Nov 6, 2019

bladedsupernova commented Nov 6, 2019

TechSupportGo commented Nov 9, 2019

MrKrzYch00 commented Nov 9, 2019

IveJ commented Nov 9, 2019 via email

aabeshkarmacharya commented Nov 11, 2019

MrKrzYch00 commented Nov 11, 2019

bladedsupernova commented Jan 1, 2020

Tophness commented Feb 15, 2019 •

edited

chenyangh commented Feb 15, 2019 •

edited

lahwran commented Feb 16, 2019 •

edited

superjayman commented Feb 16, 2019 •

edited

gabefair commented Feb 16, 2019 •

edited

bladedsupernova commented Jul 27, 2019 •

edited

bladedsupernova commented Jul 27, 2019 •

edited

MrKrzYch00 commented Jul 27, 2019 •

edited

bladedsupernova commented Aug 2, 2019 •

edited

MrKrzYch00 commented Aug 23, 2019 •

edited

glencoe2004 commented Sep 30, 2019 •

edited

bladedsupernova commented Sep 30, 2019 •

edited

anubi commented Oct 7, 2019 •

edited