
"Most corporate data in the cloud is in AWS"
But that doesn't necessarily make it easily accessible.
Some of it is sitting in ext4 and NTFS 4 filesystems on EBS volumes. Sharing that means exposing it via something like NFS or Samba, on a server-by-server basis.
Some of it is sitting in mysql, postgres (etc) databases. You can expose it by creating credentials for remote access, or by setting up replication to a third party.
Many of those systems are on private VPCs with only private address space. So you have to set up VPNs if you want third parties to access it over the network.
In other words: the fact that the data exists "in the cloud" doesn't necessarily make it easily accessible by warehousing or ML services. The sort of integration you will have to set up will actually be very similar to allowing remote access to your on-prem systems.