"one or more very large sources of pirated ebooks"
I would have liked to by a fly on the wall of the meeting that decided to go get pirated material to use as training data.
Mgr - "Okay, guys, we have this ginormous potential waiting on training data. Where can we get that ? Ideas ?"
Mkting - "Well, we could strike deals with the Project Gutenberg website, they've got plenty of free books. I'm sure they'd be willing to help."
Mgr - "How much would that cost ?"
Mkting - "It's free for the customer, but we'd need a deal where we can get stuff in bulk. Shouldn't cost more than a couple thousand."
Mgr - "How long would that take ?"
Mkting - "I guess a month or two to negociate the deal and have a contract written up."
Mgr - "Too long. We need to move forward now. Any other ideas ?"
Dev - "Well, I know this site where we can get just about everything. All I'd need to do is write a script to automate the downloads."
Mgr - "What about the contract ?"
Dev - "Um, well, there isn't any. It's BitTorrent-like, you just go choose and it drops in."
Mgr - "And we can get recent stuff, no problem ?"
Dev - "Well yeah. Pirates love recent stuff."
Mgr - "Pirated ? So no contract and no money ?"
Dev - "Nope. And it's untraceable."
Mgr - "Go for it !"