Sequence Search Introduction


Entrez is a data retrieval system developed by the National Center for Biotechnology Information (NCBI) that provides integrated access to a wide range of data domains, including literature, nucleotide and protein sequences, complete genomes, three-dimensional structures, and more. Entrez includes powerful search features that retrieve not only the exact search results but also related records within a data domain that might not be retrieved otherwise and associated records across data domains. These features enable us to gather previously disparate pieces of an information puzzle for a topic of interest. Effective and powerful use of Entrez requires an understanding of the available data domains, the variety of data sources and types within each domain, and Entrez’s advanced search features. This tutorial uses corn (Zea mays) alpha-amlyase to demonstrate the wide variety of information that we can rapidly gather for a single gene. The numbers noted in the search results will of course change over time as the databases grow. The same techniques shown here can be used for any topic of interest.

The search goals are to

• separate the wheat from the chaff – identifying a representative, well annotated mRNA or protein sequence record
• retrieve associated literature
• identify conserved domains within the protein
• identify similar proteins
• find a resolved three-dimensional structure for the protein or, in its absence, identify structures with homologous sequence