Markov networks are extensively used to model complex sequential, spatial,
and relational interactions in fields as diverse as image processing, natural
language analysis, and bioinformatics. However, inference and learning in
general Markov networks is intractable. In this paper, we focus on learning
a large subclass of such models (called associative Markov networks) that
are tractable or closely approximable. This subclass contains networks of
discrete variables with K labels each and clique potentials that favor the
same labels for all variables in the clique. Such networks capture the
\guilt by association" pattern of reasoning present in many domains, in
which connected (\associated") variables tend to have the same label. Our
approach exploits a linear programming relaxation for the task of finding the
best joint assignment in such networks, which provides an approximate
quadratic program (QP) for the problem of learning a margin-maximizing Markov
network. We show that for associative Markov network over binary-valued
variables, this approximate QP is guaranteed to return an optimal
parameterization for Markov networks of arbitrary topology. For the
non-binary case, optimality is not guaranteed, but the relaxation produces
good solutions in practice. Experimental results with hypertext and newswire
classification show significant advantages over standard approaches.