With the increasing vehicles on the road, it is becoming more and more important to sense citywide traffic, which is of great benefit to the government’s policy-making and people’s decision making. Currently, traffic speed and volume information are mostly derived from GPS trajectories data and volume sensor records respectively. Unfortunately, speed and volume information suffer from serious data missing problem. Speed can be absent at arbitrary road segment and time slot, while volume is only recorded by limited volume sensors. For modeling citywide traffic, in this paper, we propose a neural memorization and generalization approach to infer the missing speed and volume on the whole city, which mainly consists of a memorization module for speed inference and a generalization module for volume inference. Considering the temporal closeness and period properties, memorization module takes advantage of neural multi-head self-attention architecture to memorize the intrinsic correlations from historical traffic information. Generalization module adopts neural key-value attention architecture to generalize the extrinsic dependencies among volume sensors by exploiting road contexts. We conduct extensive experiments on two real-world datasets in two cities, Guiyang and Jinan, and the experimental results consistently demonstrate the advantages of our approach. We have developed a real-time system on the cloud, entitled CityTraffic, providing citywide traffic speed and volume information and fine-grained pollutant emission of vehicles in Guiyang city of China.