We study the problem of dynamic spectrum sensing and access in cognitive radio systems as a partially observed Markov decision process (POMDP). A group of cognitive users cooperatively tries to exploit vacancies in some primary (licensed) channels whose occupancies have a Markovian evolution. We first consider the scenario where the cognitive users are aware of the distribution of the signals they receive from the primary users and we obtain a greedy channel selection and access policy that maximizes the instantaneous reward, while satisfying a constraint on the probability of interfering with licensed transmissions. We also derive an analytical universal upper bound on the performance of the optimal policy. We then consider the more practical scenario where the distribution of the signal from the primary is characterized by an unknown random parameter. We develop an algorithm that can learn this random parameter, still guaranteeing the constraint on the interference probability. We also demonstrate the performance gains of all our schemes through simulations.