The example looks very good. Do you have more images to share? I think more examples would be nice to show off more of what it can handle. Different room types, interiors etc.
Also in that regards: I'm curious about what it can't handle. Any situations where it borks?
Excellent suggestion. Will find time tomorrow to add a `/gallery` page. Created an issue to track: https://github.com/fill3d/fill/issues/1 . Best first issue :D
Amazing! The inserted objects are renders of textured 3D models and not generated by a diffusion model + ControlNet? Is there a fixed set of textured 3D models available or are they generated on the fly based on the prompt?
Thats correct! Right now, we're using the Blenderkit dialog, but we can expand beyond it. When you type a prompt and search though, that's actually doing a multi-modal search (so you can ask for a 'red painting' and it'll actually find a red painting), so it's insanely more accurate than a regular search. AI everywhere!
Super cool! Layout estimation for deprojection is a GNARLY problem especially because people love white textureless walls.
Tried some on fill3D from a dataset we had before (happy to share more), and yup: https://imgur.com/a/Ut2GwZ0
Tough Tough Tough!
fxn.ai looks super cool too, I might try it out!
Would love to get hands on that dataset, how can I reach you? Or, shoot me a note at [email protected]
My use case for this would be for decorating my apartment.
I’ve got a big empty studio with a bed and couch I’ve already purchased but trying to figure out what to fill in for all the other gaps. Coffee table, media console, tv or UST projector, bar or bookshelf or desk.
Would be nice if there was a way to populate it with items/products that can be purchased and aren’t purely conceptual.
Yup this is actually a roadmap feature. Because we generate in 3D, users can bring their own 3D models and add it to the catalog. And if you add something like object capture from Apple (https://developer.apple.com/augmented-reality/object-capture...), you could literally scan your couch, upload it to Fill 3D, place, and generate.
Exciting times ahead.
I’ve not tried it yet, but came across this site the other day which meets your use case: https://aihomedesign.com
(No affiliation!)
Have real estate companies considered leaving a house unfurnished and letting potential buyers put on AR goggles to see what it would look like with their furniture?
Or could just use a phone/tablet as a "viewport". I know it wouldn't be as immersive but the barrier to have it adopted would be a lot lower.
This level of realism seems impossible in AR as of today, if path tracing a single frame takes a minute or more.
What are you thinking is your business model? I'm a sysadmin at a small MLS, trying to figure out where we'd integrate it. At $2/stage it's something we'd probably have to have you bill the Realtor directly for (I don't think we do any pass-through billing), but could maybe include a couple stages per month per Realtor. I could see a fun use-case where consumers would be able to do their own staging, but there are probably few if any Realtors that will be willing to pay $2/stage for consumers to do that.
Would love to have a proper convo on this. With bulk pricing, I can reduce the price by quite a lot. Eventually, the goal is to have users be able to stage themselves in your property website or MLS. Please shoot me a note at [email protected] !
Will it work with decks and porches?
I have images of decks and porches that need staging for the construction company's web site.
I tried the demo, it seems to be buggy and it seems to only allow you to choose existing items from a predefined db.
What bugs did you encounter? And yes, because we're using actual 3D models, there's a fixed set of models (right now, just under 300). Because the priority is ultra-realism, the current state-of-the-art for 3D model diffusion won't cut it (see OpenAI Point-e https://github.com/openai/point-e).
so you only project the background into a 3d model, and the foreground is not generated, but 3d models?
the bug I saw was after uploading a background image, on the right side, I only saw a generate and a reset button, nothing else. I clicked "generate", expecting it to ask me to input a prompt, but it started to render and the result was the same background I uploaded.
Yes that's correct. And for using it, you have to draw rectangles before you can add a prompt, similar to Photoshop's generative fill UI. Check out the video on the landing page. Lmk if you face further issues, and sorry about the lacking instructions (I'm not a great webdev).
maybe have a warning when no rectangles are drawn, I kinda wasted one credit by rendering an empty background.
Pretty awesome for first Show HN. Multimodal search is very fascinating. I am using SDXL + LoRa model over here https://news.ycombinator.com/item?id=37696033
Thank you! The audience is definitely highly technical, so this has been a very productive thread.
Now create a bunch of perspectives, and nerf or guassian splat that, and you've got a fully immersive 3D scene that is better than any rendering.
Why is it better than any rendering?
In this case it’s likely not. The advantage of Gaussian splats is that it allows you to bake in advanced lighting effects for a static scene. If you already have detailed renders there are plenty of existing approaches that perform plenty well and can be far more optimized.
Cos it's immersive (and interactive). Check out this realtime demo of 3DGS in Unity by Aras P (co-founder of Unity): https://www.youtube.com/watch?v=0vS3yh908TU&ab_channel=ArasP...
Are they just saying a 3D scene is better than a 2D rendering? I can't help but think a realtime 3D render could be just as good and probably better.
It looks like a cloud-only app. If it doesn't run entirely locally, it's useless to me. Shipping my data to an external data processor is a security risk I'm not allowed to take.
That's fine. Path tracing in a browser is pretty impractical today anyway. Check back in a few years, when WebGPU is much more mature.
Is there any way to remove objects from an initial image, so that then it can be utilized for staging?
Not right now, but that's a great roadmap feature. It should be trivial with today's model (object detection + inpainting). Created an issue: https://github.com/fill3d/fill/issues/2
Live the project, great work! can you think about adding some ethical clauses to your license. Something to allow people to use it for good wholesome purposes, but to avoid letting it be used for scammers faking AirBnB listings for example
If someone is willing to scam people on Airbnb, I'm pretty sure they're willing to break a software license.
So that's a good reason to provide people who have no capacity to create fake images an instant way to do so, while riding on the back of things the owner has no idea how they would actually create if they were asked to do so? Sweet. Let's all steal other people's property, charge for APIs and then take $15 a hit to let scammers use it.
Yes, they'll break a software license and use garbage that uses garbage that uses garbage. Way to draw the line.
I'll be honest, I don't really get your reply. I was merely saying that adding a clause in the license is pretty pointless. The hypothetical user has already decided to break one (or more) law(s), they wouldn't even think twice to break a software license (probably won't even read it).
Your comment sounds like a criticism of the project in general rather than the pointlessness of adding a clause to the license. Personally, I think this is pretty novel, better than the 100's of stable-diffusion-as-a-service things that have popped up lately.
> while riding on the back of things the owner has no idea how they would actually create if they were asked to do so
I mean, everyone builds on top of things they couldn't recreate. If you're a software developer, chances are you couldn't recreate your favorite languages runtime/compiler/whatever, you couldn't recreate your OS, you couldn't recreate the hardware that's running your software. I don't get this criticism at all.
Wouldn't that qualify as a crime already? That sounds like fraud to me.
"someone took the bed out"
FWIW this isn't my problem with this project. It's that the writer doesn't know what they're doing and represents a new type of post-code/post-crypto monkey that just links together APIs in clever ways and tries to charge maximum $ for it by selling it to people (monkeys?) who think it's magic.
People like this will make a lot of money, and eventually do something that injures you and your family personally. So it's best to attack them and slander them early and often.
> virtual staging in real estate media If you can make this work with exteriors, Landscaping design is huge. Maybe start with something simple like desert landscaping (which is really just rocks, turf, Pavers, maybe small palm trees)
What did you use to create the screencast at https://www.fill3d.ai/?
I'm curious about that too. Recently, I've seen many screencasts in the same style, and I hate them. The constant movement of the recorded area is quite distracting.
Could you speak more to the "deprojection" step? What is that?
Fill 3D takes a different step from diffusion, in that it tries to build an actual 3D scene (kinda like a clone) of what's in the image you upload. In some sense, that's actually the most fundamental representation of what's in your image (or said another way, your image is just a representation of that original scene).
So it works by trying to estimate a 3D 'room' that matches your image. Everything from the geometry, to the light fixtures, to the windows. It's heavily inspired by how humans (weird to contrast 'human' vs. AI work) do image/video compositing.
TL;DR: Image in, 3D scene out.
Could you elaborate on how that's done technically? I'm curious how you estimate the 3D room. Are you using ML based estimation like LayoutNet? How about the lighting?
You realise this is the role of entire teams at certain companies right? If you automate enough parts you'd do able to automate the work of 30 people per company doing this. Not the first to work this out either.
https://investor.wayfair.com/news/news-details/2023/Wayfair-...
Decorify from Wayfair is also using diffusion, same as the other folks who have built similar things in the market (InteriorAI is probably leading product here). We'll see where this goes :D
Can this be used to replace objects in a scene? In your demo example you place a bed, but what if I want to replace my bed with yours?
I like it, but should have added some free tier to test it out.
Nice! Like your landing page.
How well does it work on non-room images?
Depends on the image. Right now, the very first stage (deprojecting the image to 3D) makes assumptions about the image having the structure of a room: large empty floor plan; roughly polygonal geometry.
For different kinds of images, it's a question of using other cues to build a 3D structure that's very close to the original image. And no, monocular depth estimation isn't enough (happy to nerd out about why) ;)
Very cool! The challenge is now filling spaces with different lighting, i.e. sunlight entering a window in a mostly dark room while a lamp illuminates a wall.
I think this isn't too difficult of a problem. Technically, the objects that get added could be emissive. It could even be a switch, having an added light be on or off.
Wow, nice. I hope you charge realtors a fat price for this
Really can't generate the object I need to place. Few that don't work 1. terrarium 2. fish tank 3. bunk bed
Between Fill3D's architecture that 'path traces to render ultra-realistic results' and fxn.ai transparent deployment capability... I gotta say this is super impressive work. I can use both in a current project, and will be investigating.
> Right now, you need an image of an empty room
I needed an image of an empty room recently. I just took a photo of my very not empty room, ran it through a canny algorithm, painted out the objects with black, and then used stable diffusion with canny controlnet to generate an empty room. Worked pretty well. Did not look that much like the original room, but it was certainly good enough to check furniture placement etc.
This kind of stuff is the future of film making.
Imagine adding "yourself" into a scene like this, moving around as you were/are from a video you just created of yourself. As in: film yourself walking around your bedroom with your phone. Then use an app like this to add you and your movement (cropped from the video) to a different background scene.
Goodbye, Hollywood elites!
I couldn't agree more! You should check out the amazing work from the folks at Luma Labs (https://lumalabs.ai/). They're a loose inspiration for this project.
This is an excellent example of a full pipeline from blender <> luma tools <> 3D in ShapesXR (who are also doing amazing work atm)
https://twitter.com/GabRoXR/status/1706691466460836333?t=3z7...
This is what I’m working on at https://skyglass.com. You should check it out!
Hey, I'd love to chat with you about how you power these on-device AI features (like background replacement). Function is building infrastructure for both server-side and on-device AI inference.
The goal is for devs like you to bring your original Python code, and we'll generate a library that is cached and runs on-device. See this demo: https://demos.natml.ai/@natml/blazepalm-landmark (wave your hands)
> This kind of stuff is the future of film making.
> Imagine adding "yourself" into a scene like this, moving around as you > were/are from a video you just created of yourself. As in: film yourself > walking around your bedroom with your phone. Then use an app like this to > add you and your movement (cropped from the video) to a different > background scene.
> Goodbye, Hollywood elites!
As someone working in this arena, statements like this make me chuckle.
Don't get me wrong-- I think this is really cool. I get that people are excited about new tech, and technical people always overestimate the value of technical advancements in creative workflows, but no: people being able to place a perfect hyper-realistic replica of themselves in a film wouldn't kill the film industry any more than RPG Maker + generative AI would kill the games industry. I'd wager it probably would not even leave a dent.
Firstly, there would need to be a film to begin with, and that requires a lot. A whole hell of a lot.
Secondly, characters matter. A lot. Especially main characters. Do you replace the name of the main character with your name in stories you tell? How about doing a global search-and-replace in the ebooks you read? It's not like we don't have the technical capability. I could see this being a novelty feature in some action movies, especially superhero movies, and more likely in games and porn, but one of the biggest draws a movie has is who stars in it-- and if you look at the rest of the human population, you'll notice that we're not choosing representative samples. The fact that it's someone else with a personality and back story and motives and strengths and flaws-- a character-- is a pretty important part of stories. Their appearance matters, too. Most people don't even like staring at themselves for a few minutes, let alone for an entire feature length film. In most situations, I think it would be distracting as hell. Sure, people might find it amusing to see themselves in Top Gun Maverick, but would they want to see themselves getting bullied by an IRS agent in Everything Everywhere All at Once? Getting a box cutter held to their throat in Emily the Criminal? Would replacing Jon Hamm's appearance with their own really make watching Madmen better? Do most people want to see themselves beat to a pulp in Fight Club? I'd wager that few would.
Thirdly, most people aren't particularly interested in putting in the thought and effort to customize their phones: I'm pretty sure they're less interested in putting thought and effort into customizing their passive entertainment. They just want to hit play and have a nice little escape.
So, no. As long as people will continue to seek entertainment for the reasons they've always sought it, this is not going to fundamentally change the art of storytelling anytime soon.
That's an awfully brown colored pink bed in the demo :)
The tech itself looks amazing though, well done.
Love the "No need for nasty YAMLs or Dockerfiles" copy on the Function website. Plus ca change plus c'est la meme chose. HTMX, SQLite, Postgres are hip. Building giant supercomputers is back in, fuck the edge. Even starting to see a new XML wave.
Today watched a video about gravelbike touring where some young whippersnapper was getting mad excited about the idea of putting a rack and panniers on the back of their bike - just like in the good old days. What a world we live in. I'm 100% old af
Very brave to show us that ugly brown bed generated from the prompt "pink bed".
I gasped. This is what will make it trivial to simply highlight a persons swimwear and tell the AI to remove.
Have you never used stable diffusion?
Today, as in right now, with less than 5 relatively not-horrible photographs you can create a realistic AI version of anyone to do anything you'd like them to, or wear. Animation included. From your home computer.
Or just inpaint the clothes away from any image.
Look, far from being morally offended by that, I can relate. In '92 - like 6th grade for me - a friend got one of those hand-held scanners for a Mac LE that let you drag it very slowly across the page and get a 300 DPI image into Photoshop v1. And we completely proceeded - as 12 year olds - to paste the faces of girls from our yearbook onto GIF files of porn actresses. That we downloaded at 2400 baud from BBS's.
That's what was going on in 1992.
I'm a little alarmed at the general laziness / lack of initiative of these kids today, TBH. but whatever.
This is a story of some 12year old kids doing it to each other in Spain. Another reason why kids should be off social media until they get a driver's license and parents own all data before they turn 18.
> Another reason why kids should be off social media until they get a driver's license and parents own all data before they turn 18.
This has nothing to do with social media. The images have circulated through WhatsApp and Telegram channels. And if they hadn’t, they would have through email or MMSes.
Growing up in the age of generative AI is at least a big sea change as in the age of social media, or the internet etc.
I am impressed by the tech, but appalled by the possibilities.
Where I live, it is already common practice for real estate 'agents' to photoshop the properties listed for sale to make them look fully renovated and furnished. When in reality the house is empty and in very bad shape.
This tech will make it even harder to judge a property without actually viewing it in real life.
I think we can no longer stop tech like this from being used in ads (because that's effectively what property listings are nowadays). The only solution I think is policies/laws that prevent real-estate marketplaces from showing fake pictures.
That all said, I think the author can make big money from realtors by selling this tech as a subscription model.
I think we already have laws around misrepresenting things for sale... As far as furnishings: that's definitely spelled out in the contracts for what is included.
I'm sure it varies area to area, but the biggest thing I see in our area is things like adding sunsets in the windows or behind the property photos, but we wouldn't necessarily know if a Realtor had photoshopped out mold or water damage or the like.
Just like with '* Serving suggestion' pictures on food packaging, can't they just do '* Decoration suggestion' to shield themselves from pictoral misrepresentation charges?
They've had staging photoshop forever.
True, virtual staging is a very established product in the real estate media market.