We consider the problem of multicasting data from a source to receivers that possess arbitrary subsets of the data apriori as side information. Fountain codes, which are an ideal solution to the standard multicasting problem without any side information, have also been proposed as a potential approach for the side information problem in multiple independent studies recently. Relevant to such a context, we formulate and study an optimization problem over degree distributions to minimize the overhead necessary for complete decoding, and prove that: (i) Degree distributions converging to the standard soliton distribution cannot exploit side information in terms of the overhead necessary for complete decoding. (ii) An asymptotic shifted soliton distribution achieves an overhead which is within a constant factor (< 2) of the optimal overhead (iii) There exist no degree distributions which achieve asymptotically optimal overhead for any non trivial constant fraction of the data as side information. While (iii) is discouraging, this limitation can be sidestepped by using systematic versions, where intermediate symbols are generated from the source symbols, to which the fountain code is then applied. One important implication of this is that the systematic versions are in a sense indispensable to achieve asymptotic rate optimality for the side information problem.