There definitely is a trade-off between memory size and how quickly it can be accessed.
IIRC IBM z/Arch processors (AFAIK they are internally similar to POWER) have clock limited to around 5 GHz or so, so that L1 cache lookup costs only one cycle (a design requirement).
For example, z14 has 5.2 GHz clock rate and 2x128 kB data and instruction L1 caches.
do lookups in large tables ever (practically, not theoretically) take one clock cycle?
If there's a large lookup table, it would have to come from memory, which might mean cache and memory hierarchy delays, right?