FAQ - Frequently Asked Questions
- What does PubGene do? View Demo
- What is text mining?
- What are "Literature Networks"? View Demo
- How do I find out what a gene (or protein) "does"? View Demo
- Does PubGene also deal with DNA- and protein sequences? View Demo
- Can PubGene find Genes using Keywords? View Demo
- How do I find a connection between specific genes? View Demo
- What does the coloring of "nodes" in network images mean? View Demo
- What are the numbers on the "edges" of the network? View Demo
- How do I pull up articles from a network image?
- Does PubGene have commercial products?
- Does PubGene perform customized projects?
What does PubGene do?
PubGene helps you retrieve information on genes and proteins. The underlying structure of PubGene can be viewed as a "gene-centric" database. Gene and protein names are cross-referenced to each other and to terms that are relevant to understanding their biological function, importance in disease and relationship to chemical substances. The result is a "literature network" organizing information in a form that is easy to navigate. No one researcher can be expected to stay informed on all that is happening in genetics. Let PubGene help you find connections and speed the discovery process! (See Jenssen et al. 2001).
What is text mining?
Generally, text mining is a tool for the retrieval of information from text using a specially developed algorithm depending on the intended use. PubGene mines the abstract texts of 25 Million PubMed articles for co-citation of multiple genes or proteins and displays them as "Literature Networks", where nodes represent each gene or protein and the connecting lines represents the number of articles, in which each gene or protein pair is co-cited. The list of articles can be retrieved by clicking on the number along the lines.
What are "Literature Networks"?
Networks viewed in the Network Browser show how multiple datasets containing gene or protein names are co-cited in the literature. That is, names that appear in the same text form a pair and are said to be neighbors in the network. When a gene or protein is studied, there is a good chance its name (or a synonym for that name) will appear in articles together with other gene or protein names. One can visualize how most genes that have been studied will be connected either directly or indirectly to each other in a Literature Network. Connections in the literature are a strong indicator of biological interaction. Networks can also be annotated with Gene Ontology (GO), Medical Subject Heading (MeSH) or chemical keywords. By organizing genes and proteins into networks PubGene helps you visualize and navigate information to gain understanding.
How do I find out what a gene (or protein) "does"?
Finding out what a gene or protein does will always require some effort. Experts are generally specialists on the function and interactions of some relatively small number of genes. PubGene extracts information associated with individual genes and proteins from MEDLINE records and metadata. In this way, genes and proteins are linked to "keywords" such as Gene Ontology (GO), Medical Subject Heading (MeSH), drugs, toxins and other chemicals or compounds (Chem). The options within the BioAssociations tool allow you to view lists of such terms associated with individual genes or proteins. Using these options in BioAssociations you can find out what is known about the processes, function, cellular compartmentalization, pathologies and behavior in response to chemical treatment associated with a gene or protein.
Does PubGene also deal with DNA- and protein sequences?
Many genes identified by sequencing efforts remain to be described in terms of the function of their products. In such cases you may be able to guess the function of the gene of interest by looking at similar genes. DNA- and protein sequences are also useful for experiments aimed at validating your results, e.g. as primer sequences for RT-PCR. Generally, to find similar genes you look for genes with similar structure (that is, similar nucleotide or amino acid sequence). PubGene can draw networks based on sequence similarity. This function is available under Advanced Options in the Network Browser. PubGene sequence networks are based on pair-wise alignment of reference sequences in a pre-compiled database. By viewing a sequence network you can quickly see if other genes with similar sequence have been described in terms of their biological or medical importance. You can also perform searches of sequence databases with the PubGene Sequence Homology tool. This tool aligns a DNA or protein sequence against the entries in one of several databases using the Smith-Waterman algorithm. PubGene employs the powerful PARALIGN method. This makes it possible to return the highly accurate and sensitive Smith-Waterman alignments in a matter of minutes â at speeds comparable to the quick (but dirty) BLAST and FASTA methods.
Can PubGene find Genes using Keywords?
PubGene tools are designed to allow you to find gene (protein) to gene as well as keyword to gene (protein) connections. A "keyword" submitted to the PubGene Network Browser will show a literature (or sequence) network of genes associated with that keyword. BioAssociations can generate lists of keywords associated with a gene (protein). Or BioAssociations can do the opposite: find a list of genes (proteins) relevant to a keyword. In both cases the entries in the list are ranked by the significance of the association.
How do I find a connection between specific genes?
Some genes are not co-cited (mentioned together) in any MEDLINE record. But you might be able to infer a relationship between genes via an intermediary PubGene includes the Shortest Path option in the Network Browser. By "crawling" through cross-referenced genes, Shortest Path allows you to find indirect connections between genes. The Shortest Path is available in the Network Browser under Advanced Options in the Select Network Type pull-down menu.
What does the coloring of "nodes" in network images mean?
Each PubGene networks is drawn in response to a specific query. The color scheme in the networks reflects "distance" from the query term submitted. If a single gene term is the query its "node" in the network will be colored bright red. Neighbors of (genes co-cited with) the query gene are a darker red color and neighbors of neighbors are colored black. If two genes are submitted in the same query, both will be shown as bright red in the network. Keyword queries using default settings will draw networks with as many as 10 of the nodes in the network colored bright red, these corresponding to the 10 genes most significantly associated with the keyword.
What are the numbers on the "edges" of the network?
Networks produced by the Network Browser have numbers on the "edges" between nodes (that is, the lines connecting genes to each other or genes to keywords). These numbers indicate the "strength" of the association between the nodes. Under default settings they correspond to the number of MEDLINE records in which the connected genes are co-cited. Using Advanced Options you can change graph drawing to Probabilistic. Values shown on network edges will then be decimals that can be read as p-values. Another option is to draw networks of sequence similarity. Sequence network edges are labeled with e-values.
How do I pull up articles from a network image?
Click on a network edge. PubGene will open the Literature tool and present the corresponding list of MEDLINE records. Clicking on the PubMed ID for an entry will bring up a window showing the record and highlight the query terms.
Does PubGene have commercial products?
PubGene offers products by subscription. Visit http://www.pubgene.com to learn more.
Does PubGene perform customized projects?
PubGene is currently involved in a number of collaborative projects to meet special needs of organizations. Visit www.pubgene.com to learn more.