If there are B units in the input layer, and B units in each of the Cout convolution layers, and each of the output layer is connected to all units in the input layer, the number of weights is B^2 /layer, so B^2*Cout in total.
In CNNs, each output unit connects to only K input units, but the weights are the same for all output units in the same convolution layer. So only K*Cout weights in total. It is as if each of the neurons in one convolution layer is searching, in one part of the input space, for features that are specific to that layer, and the response in one unit is maximal when the activity in its corresponding input units mostly resemble that feature. For instance, if the weights of one convolutional layer are (0 1 0) (k=3), that layer will look for very sharp and short changes between contiguous input units.
Hope this is correct and makes sense, ask again if not, or perhaps someone else can give a better answer.