Help page for Find PubGene gene symbols in the PubGene nomenclature database
This tool allows you identify the correct "Primary Symbol" to use for your gene of interest. The search term can be a combination of words, such as "estrogen receptor", or a gene name, such as "TP53". The search term does not have to be case sensitive, only the Primary Symbol is.
Primary symbols for human genes in PubGene are normally those that are defined by HUGO. Mouse genes are assigned primary symbols based on the Jackson Laboratory nomenclature guide. Rat genes are assigned primary symbols according to the standard rat nomenclature rules. Yeast genes have primary symbols defined by SGD (Saccharomyces Genome Database).
Explanation of input fields and parameter settings:
Input data
- Organism
- - use this to select the organism relevant for your data. Possible
organisms are:
- human (Homo sapiens)
- All organisms
- mouse (Mus musculus)
- rat (Rattus norvegicus)
- cow (Bos taurus)
- pig (Sus scrofa)
- dog (Canis familiaris)
- chicken (Gallus gallus)
- zebrafish (Danio rerio)
- fugu (Takifugu rubripes)
- fly (Drosophila melanogaster)
- worm (Caenorhabditis elegans)
- rice (Oryza sativa)
- arabidopsis (Arabidopsis thaliana)
- yeast (Saccharomyces cerevisiae)
- yeast (Schizosaccaromyces pompe)
- e.coli (Escherichia coli)
- anthrax (Bacillus anthracis)
- Bacillus cereus
- Streptococcus pneumoniae
- Staphylococcus aureus
- hiv (Human immunodeficiency virus 1)
- Search expression
- - use this field to enter the search expression to use. The search expression may be any string (symbol, name, or part of such) and may include regular expression characters.
- Search fields
- - use this field to select in which data fields to search for primary ID.
- Case mode
- - use this field to select whether the search should be case sensitive or not.
Explanation of search results
The result of a search is displayed as a list of all matching "lines" in the nomenclature database. The first column displays the "Primary Symbol" corresponding to the match and the second column displays details about the match.
After identifying the correct "Primary Symbol" from your search, you may retrieve the literature neighborhood network directly by clicking on the hyper-linked Primary Symbol.
Nomenclature Note
The PubGene tools use case-sensitive primary symbols to identify genes and proteins. The correct syntax depends on the organism and biological entity. In general, the identifier (primary symbol) for the gene is not the same as the identifier for the corresponding protein(s), although, in many cases the identifier for the gene is similar to the identifier for the protein(s). Moreover, for many genes or proteins or the combination of the two, the identifier may differ across organisms. The difference however may often be only in case (lower versus upper).
PubGene tools utilize an automatic lookup to find the correct primary symbol (identifier) for an input query term. This in order to allow the user to input a gene or protein alias and/or a symbol with a non-standard capitalization. For a given input term type, the lookup will try to find the best match for a given input query term in the following way:
- As primary symbol: Does the input string correspond to a primary symbol?
- As case translation of a primary symbol: Does the input string correspond to a primary symbol when disregarding capitalization.
- As alias symbol: Does the input string correspond to an alias symbol?
- As case translation of an alias symbol: Does the input string correspond to an alias symbol when disregarding capitalization?
- As primary symbol for the corresponding protein if the query term type is gene and vice versa.
- As an Affymetrix probeset ID.
- As a UniGene cluster ID: Does the input string match a UniGene cluster ID of the selected organism; note, the UniGene ID must include the two-letter organism code and the period (dot) between the organism code and the number.
- As an IMAGE clone ID; only all-numeric input strings may match.
- As a GenBank Accession number.
As gene identifiers, PubGene generally uses the official gene symbol from the official nomenclature committee(s) for the various organisms.
As protein identifiers, PubGene generally uses the corresponding Swiss-Prot identifier without the _ORGANISM string.
When PubGene creates association networks or associates Chemical & Compound, MeSH and GO terms to genes and proteins, PubGene uses all known gene and protein aliases and then combines information from all aliases for each gene and protein.