Imagine a baby. Before it opens its eyes for the first time it is connected to some machine that can completely control sensory input, similar to The Matrix.
Instead of feeding in images of a physical world, the only input it gets is plain white text displayed in front of its "eyes" on a black background. This is the only "external input" it will have for its entire life. It can respond and interact somehow through muscle actions, no need to get hung up on the details.
Words represent social concepts/ideas. How does it understand "Christmas"?
There is a major difference between being able to calculate the statistically correct reply to a question and being able to understand a question.
We might already have all of the "technology" required for sentient AI, it just needs quite a bit more processing power, which will come in time. The problem is, our way of "teaching AIs" means that we are potentially training the next-in-line-for-the-throne to interact with the world without any understanding of it.
Imagine the difference between an entity in charge of all "law enforcement drones" learning "conflict resolution" from Facebook posts vs court rulings vs Life. You might argue that court rulings are a better training set than interacting with the typical person, but how do you understand the basis for some ruling on torture if you've never felt pain?