IMDB and LinkedIn -> Whitetruffle are great examples of the challenges and nuances here. If imdb has done all this work to structure and normalize “facts” — why shouldn’t there be business value for them in protecting that including being able to prevent others to just scrape and reuse? You may be right that someone could just write it all down manually. But perhaps that burden of work is enough to deter from just leveraging IMDB vs doing it in an automated fashion.
Re white truffle — this reminds me of issues we faced with in Facebook connect. The original idea behind Facebook connect is your profile was your one source of truth. as you “connected” your Facebook identity to other services, it was supposed to be a live connection so if you changed anything on Facebook — your profile picture, etc it should be updated everywhere. This meant sites couldn’t just store a copy of your profile data but query it live from Facebook. This was a policy to enforce though, not something you could protect in the code. On LinkedIn if you changed your profile — maybe added a detail or removed a position, you could argue that a user would expect the same to happen for sites they shared their LinkedIn profile with. So enforcing either explicit entry, or not storing the LinkedIn profile data might have been what they were trying to do conceptually. Even though they obviously also would have been concerned from a business value stand point if it were too easy for another jobs business to suck in the entire LinkedIn userbase of data