00       00
  
Bug Report
Report an error
  

Help


Overview
The BLAST (Basic Local Alignment Search Tool) tool compares input sequences to PlantGenIE sequence databases to identify homologous sequence matches. 

Basic Usage

Simply paste your sequence (with or without a FASTA header) into the Query Sequence input text box. Alternative you can retrieve a transcript sequence by entering a gene ID into the Load example text box, or you can upload a sequence file (Less than 100 MB) using the upload file function. Having used one of these input options, click and select the desired dataset from the lists of available BLAST databases. Finally click the BLAST! button at the bottom of the page.

PlantGenIE BLAST uses standard default NCBI BLAST options. However users can change the following advanced options:

Option Description
Scoring matrix Substitution matrix that determines the cost of each possible residue mismatch between query and target sequence. See BLAST substitution matrices for more information.
Filtering Whether to remove low complexity regions from the query sequence.
E-value cutoff The maximum expectation value of retained alignments.
Query genetic code Genetic code to be used in blastx translation of the query.
DB genetic code Genetic code to be used in blastx translation of the datasets.
Frame shift penalty Out-of-frame gapping (blastx, tblastn only) [Integer] default = 0.
Number of results The maximum number of results to return.

BLAST results

The BLAST Results page will be automatically reloaded until the search results are successfully retrieved. BLAST results are organized into a table containing Query ID, Hit ID, Average bit score (top), Average e-value (lowest), Average identity (av. similarity) and Links. Clickable BLAST results display the corresponding region of identified homology within the GBrowse tool, where the matching region is shown.

Data

The BLAST tool uses public genome assemblies, early release de novo assemblies from UPSC and data from [Phytozome] (http://www.phytozome.net/) and Plaza.

Implementation

PlantGenIE BLAST search is implemented using NCBI Blast (v2.2.26) and a backend PostgresSQL Chado database. We use PHP, JavaScript, XSL, Perl and d3js, Drupal libraries to improve Open Source GMOD Bioinformatic Software Bench server to provide a graphical user interface.


The NCBI BLAST family of programs includes:
blastp
Compares an amino acid query sequence against a protein sequence database
blastn
Compares a nucleotide query sequence against a nucleotide sequence database
blastx
Compares a nucleotide query sequence translated in all reading frames against a protein sequence database
tblastn
Compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames
tblastx
Compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

Query sequence
The query sequence to be used for a BLAST search should be pasted in the 'Sequence' text area.It accepts a number of different types of input and automatically determines the format or the input. To allow this feature there are certain conventions required with regard to the input of identifiers (e.g., accessions or gi's). These are described in 3) below. Accepted input types are FASTA, bare sequence, or sequence identifiers .

1.) FASTA
A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line (defline) is distinguished from the sequence data by a greater-than (">") symbol at the beginning. It is recommended that all lines of text be shorter than 80 characters in length. An example sequence in FASTA
format is:

>lcl|MA_1 len=89935 
TGTGTACTCTTGTGATTGTGTTTCTCTCAGTGATCCTATCTATGTTATTGTTGTCTAGTAAATTGAAAGTAACCTAATAA 
TAGTAGAAACTTTAACACTACAAATGCTTACTAGGTCCAAGAAGAGAATAAGGGTGGAGACCATGGAGGCTTCGACCAAG 
GAGGCTTCAACAAAGGAGGTTACCAAGGAGGCCAGAGAGGAGGATATGGAAGAGGAAGAGGAAGAGGATATGATGGAGGA 
GGAAGACCACCTACCTTTAATGGTGGTGAGATAGGCCACTTGTCACGATTTTGTGCCAAGCCGCATGCACCGTGTGGGTA 
TTTCCCCAACTTCGACCATGTCACCGAGGATTTCCCAAAATTATTGAAAAAATGTGAAGAAAAAAAGGGGCATTGCAACA 
TGGTGACTGCTAAGTTGATGTACGAGTGGTAACCCAAGGAGGCACCCATATGAGAGTGAAACTAGAACAGGGAGAAGGTT 
CAAGGAAGAATATAGAAGGAAACATTAGAAAATCATCCCAATAACCTCCTAAGTTTGACAGTGTGCATCATGATTAGTTA 
		

Blank lines are not allowed in the middle of FASTA input.
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue). The nucleic acid codes supported are:

		A  adenosine          C  cytidine             G  guanine
		T  thymidine          N  A/G/C/T (any)        U  uridine 
		K  G/T (keto)         S  G/C (strong)         Y  T/C (pyrimidine) 
		M  A/C (amino)        W  A/T (weak)           R  G/A (purine)        
		B  G/T/C              D  G/A/T                H  A/C/T      
		V  G/C/A              -  gap of indeterminate length
		

For those programs that use amino acid query sequences (BLASTP and TBLASTN), the accepted amino acid codes are:

		A  alanine               P  proline       
		B  aspartate/asparagine  Q  glutamine      
		C  cystine               R  arginine      
		D  aspartate             S  serine      
		E  glutamate             T  threonine      
		F  phenylalanine         U  selenocysteine      
		G  glycine               V  valine        
		H  histidine             W  tryptophan        
		I  isoleucine            Y  tyrosine
		K  lysine                Z  glutamate/glutamine
		L  leucine               X  any
		M  methionine            *  translation stop
		N  asparagine            -  gap of indeterminate length
		

NOTE: ¹ The degenerate nucleotide codes in red are treated as mismatches in nucleotide alignment. Too many such degenerate codes within an input nucleotide query will cause PopGenIE BLAST to reject the input. For protein queries, too many nucleotide-like code (A,C,G,T,N) may also cause similar rejection. ² For protein code, U is replaced by X first before the search since it is not specified in any scoring matrices. ³ BLAST will not take "-" in the query. To represent gaps, use a string of N or X instead.

2.) Bare Sequence
This may be just lines of sequence data, without the FASTA definition line, e.g.:

GTGTACTCTTGTGATTGTGTTTCTCTCAGTGATCCTATCTATGTTATTGTTGTCTAGTAAATTGAAAGTAACCTAATAA 
TAGTAGAAACTTTAACACTACAAATGCTTACTAGGTCCAAGAAGAGAATAAGGGTGGAGACCATGGAGGCTTCGACCAAG 
GAGGCTTCAACAAAGGAGGTTACCAAGGAGGCCAGAGAGGAGGATATGGAAGAGGAAGAGGAAGAGGATATGATGGAGGA 
GGAAGACCACCTACCTTTAATGGTGGTGAGATAGGCCACTTGTCACGATTTTGTGCCAAGCCGCATGCACCGTGTGGGTA 
        

Blank lines are not allowed in the middle of bare sequence input.

3.) Sequence file
This function allows users to upload a text file containing queries formatted in the formats outlined above. Long sequences should be uploaded through this option to avoid possible broswer buffer size limit. For more information about BLAST please see the extensive documentation provided by the NCBI (BLAST docs).

GBrowse is an open-source, genome annotation viewer.

Using search
To direct Gbrowse to a particular region of the chromosome, type a gene name, a short sequence (minimum of 15 bp), or a nucleotide range in the Landmark or Region box located near the top left of the page and click on the Search button. Gbrowse will recognize a standard gene name as well as its synonyms. For example, in the PopGenIE GBrowser you can search for b numbers by typing Potri.001G266400.To find a nucleotide range, type: :xx..yy, where is the accession of the genome you are looking at. This will usually already be in the search box from the last thing you navigated to.xx and yy indicate the nucleotide positions of the start and end of the range you want

"Rubber band"
selection A convenient way to move along the genome is to click and drag in either the overview or region panels. Gbrowse will reload the image to encompass selected area when you release the mouse button.

Sliding handles
The area shown in the Details panel is highlighted by a box in the Overview and Region panels. You can grab the box and slide it left or right within limits (it can't slide over the whole genome).

Scroll/Zoom
Once you get to a particular location, you can fine-tune the view the Scroll/Zoom buttons to move along the chromosome or change magnification.


The exImage provides an intuitive pictographic view of expression data across a diverge range of spruce datasets. Therefore we can monitor differential expression levels across multiple organs and tissues using .
How to use ?
Please enter gene id (ex:Potri.001G266400) inside the input text area and hit "GO" buton. It will colour samples according to the gene expression values.

Absolute/Relative
uses VST(Variance-Stabilizing Transformation) Unit for absolute, and no unit for the relative.
Absolute: absolute expression values obtained by aligning RNA-Seq reads to the gene models and calculating VST.
Relative: expression log fold changes where the average expression of the 22 samples has been subtracted from the absolute expression of every sample.

The purpose of this tool is to plot the level of gene expression per genes.
How to use exPlot?
Please type in multiple gene ids (ex:Potri.001G266400) inside the input text area separated by comma, space, tab or new line and hit "Search" buton. Soon after it will plot log 2 gene expression values forfor input genes and default set of samples.

Using the plot
Hovering over gene ID's on the right panel will outline their respective expression profile. Hovering over the expression profiles will bring information on the gene/sample at the particular hover point.

Plot control options
Using the filtering and control buttons (Additional control icon and pencil icon) the plots can be further customized by choosing the element displayed on the category axis (X axis). Exports are also available to both image and data.

Gene Ontology (GO)
The gene ontology is a way to structure gene product properties using controlled vocabulary (ontology) made for use in formal grammars and various other computer semantics studies (data mining, natural language processing etc). It covers three domains (namespaces): biological process, molecular function, cellular component. Each term has parent-to-child relationships to other terms in the gene ontology, so that when these relationships are followed up until the top nodes it is forming a directed acyclic tree.

Observations: Not all species are annotated officially and there are slimmed (filtered) versions available for the gene ontology, like PlantSlim, where only GO terms related directly to plants are considered. In the case of higher plants, the best annotated organism is Arabidopsis. Apart from GO, there are other ontology graphs out there that we are using in Popgenie (but not for enrichment studies): the plant ontology (PO) is structuring information that refers to plant organs and development , the environment ontology (ENVO) contains useful details about the experiment type (treatment, soil, infection, etc.). There are other functional annotations sets available for gene products that we do not use at the moment, like KEGG, which annotates if a gene product is involved into catalyzing a certain biochemical reaction.

Figure 1: An example of how the parent relationships available to an input GO term can be followed up its parent terms to form a tree.

What relationships count for enrichment?
The GO was not created with the singular purpose of enrichment studies, therefore a part of its structural relationships cannot be used for enrichment. This tool uses only the ‘is_a’ and ‘part_of’ relationships, and filters away the ‘part_of’ relationships that cross namespaces.

Enrichment test
For the enrichment test the user presents a list of genes. The purpose of a general functional enrichment test is answering the question: “Is the input set S enriched in a certain function?” Some of the genes in set S are annotated to our function f (set a) but there are other genes (set b) that are annotated to the same function and are not in set S. Also, some genes in S are annotated to other functions (set d), out from a background set (set c) of genes that have no annotation for the tested function go, and are outside the set S. In simple terms, using the picture below, an enrichment test studies how significant is the overlap between set a and set b, with regards to the overlap of set d and set c (called the background of the test). Currently we perform the functional enrichment test only on GO terms, so in this case a ‘specific function’ means ‘a specific GO term’. We use the Fischer’s exact test to compute the significance level (p-value).


Figure 2: GO Enrichment test.

Multiple testing corrections
P-values are only valid statistically if a single test is made, but when we test the input gene set S against every available functional set (GO set) the confidence interval of 0.05 is usually too high. The p-values are not valid due to multiple causes (overlapping annotations, sets having different sizes, etc). Whether to apply a certain correction or not is a matter of controversy. We implement 3 types of corrections:
1.) Bonferroni correction. This correction using the number of tests to correct the p-values. It is considered the most conservative correction.
2.) Bonferroni-Holm is considered as restrictive Bonferroni but it is dealing better with cases of dependency among tests.
3.) False Discovery Rate (FDR) – implemented using Benjamini-Hochberg algorithm. A less strict correction than Bonferroni but one which is most accurate when the tests are not independent.

Modular Enrichment Analysis (MEA)
This enrichment test makes the enrichment background set relative to the parent terms that have additional annotation. It is useful for cases when we need to know if a child term is enriched relative to its immediate parents. The parent terms used for background can be specified using a maximum level cap. Alternatively, the parents are investigated upwards through the GO tree until new annotations are being found.

Annotation details
We used the Populus GO annotation list from [Phytozome]. A large majority of these annotations are transferred to Populus from Arabidopsis, mainly by means of sequence similarity. The annotations are inherited upward through the GO tree. As previously mentioned, for computing the annotation inheritance only the ‘is_a’ and ‘part_of’ relationships are considered, and we also filter away the ‘part_of’ relationships that cross namespaces.

Please go to Gene Search tool and paste your gene ids or any search terms related to Populus trichocarpa annnotations. We can use the "Select Displayed Annotation" button to enable or disable different columns.

This tool is mapping the current gene list with the Populus chromosomes. It is useful for selecting genes situated on certain chromozomial regions. To select genes drag a rectangle box across the desired regions and the genes will be added to an export list. You can further load the exported list onto the genelist and add it to other lists or use it separately in various analysis tools. The tool also generates a publication-ready plot of the chromosomal mapping of your gene set.

exNorthern
Digital Northern heat map representing the library distribution of ESTs representing gene models within PopulusDB as described in this paper(http://www.biomedcentral.com/1471-2164/9/589/).Which is based on EST frequencies in different libraries (Sterky et al. 2004) – of the OPLS-generated leaf gene list and shows that the greatest prevalence of genes was found in the shoot meristem, young leaf and apical shoot libraries. New RIA(Rich Internet Application) based Digital Northern tool  developed using Adobe Flash technology.Main goal is to enhance simplicity,efficiency and  real time interaction to user.
How to use?
Type in some Poplar gene ids inside the right side text input box by separating comma, newline or tab .Then change the variable drop down lists(Plot gene names?, Plot dendrogram?, Clustering method ...) ,Color picker(Background colour) or slider values(Max genes for colour, Min gene frequency) according to your requirement and it will simultaneously update the Heatmap otherwise click Submit button.Finally you can download the PDF file by clicking Download PDF link.

cDNA librarys
Cambial zone (A + B)   Populus tremula x tremuloides T89. Bark was peeled, and tissue scraped from both exposed surfaces with a scalpel. Sample includes developing xylem, cambial zone and mature phloem.
 
Active cambium (UB)   Populus tremula. Stem samples were collected from 3 different trees growing south of UmeŒ on July 10th 2001. 30-micrometer section were obtained by cryosectioning and used for RNA preparation.
 
Dormant cambium (UA)   Populus tremula. Stem samples were collected from 3 different trees growing south of UmeŒ on October 5th 2001. 30-micrometer section were obtained by cryosectioning and used for RNA preparation.
 
Tension wood (G)   Populus tremula x tremuloides T89. Wood scrapings of a tree inclined for 3 weeks in the greenhouse. Tissues should mainly contain wood cells that are actively forming secondary cell wall and a G-layer
 
Wood cell death (X)   Populus tremula x tremuloides T89. Sample was taken from stem and included xylem cells that started secondary cell wall formation but mainly those where cell wall was fully developed. The sample also included cells that had died.
 
Young leaves (C)   Populus tremula x tremuloides T89. Library described in Larsson S, Bjorkbacka H, Forsman C, Samuelsson G, Olsson O. (1997). Molecular cloning and biochemical characterization of carbonic anhydrase from Populus tremula x tremuloides. Plant Molecular Biology 34: 583-592. Trees cultured in greenhouse in fertilized peat under natural light supplemented with metal l halogen lamps at a PPF of 150 microE. Photoperiod 18 hrs, 20/15 C. watered daily and fertilized once a week.
 
Senescing leaves (I)   Populus tremula, Library described in Bhalerao et al. (2003) Gene expression in autumn leaves. Plant Physiol. 131: 430Ð442. Sample collected from one wild tree on the UmeŒ University campus. Sampled September 14th 1999 (few days before visible leaf senescence was observed) at 11.00. Mid-rib were removed.
 
Cold stressed leaves (L)   Populus tremula x tremuloides T89. Greenhouse plants were transferred to 5¡C. Fully developed leaves were sampled 3 and 4 days after transfer and pooled.
 
Dormant buds (Q)   Populus tremula. The same tree as in the senescing leaves library. Dormant buds were collected in February.
 
Petioles (P)   Populus tremula. Petioles was collected from several individuals, growing in long days conditions and stressed in different ways were pooled. Stress treatments were 1) Mechanical stress: A tree was hit every second for 20 hours, resulting in trembling of the whole tree 2) Nutrient stress: A tree was planted in perlit and grown without nutrients for two weeks 3) Biotic stress: A tree infested with (ticks) 4) Cold stress: A tree was exposed to 5¡C(under short day conditions and sampled after 15, 10, 20 and 37 days.
 
Virus/fugus-infected leaves (Y)   Populus tremula. Leaves from different stages infected either with 1) Poplar Mosaic Virus or 2) Venturia tremulae were sampled and pooled. Healthy non-infected leaves were sampled and used as driver pool in a partial subtraction step of the cDNA synthesis. Sequences Y001-Y004 are from the Virus infected and Y005-Y024 from the fungi infected partial subtractive library.
 
Flower buds (F)   Populus trichocarpa. Library described in Rottmann, W.H. et al. (2000). Diverse effects of overexpression of AFY and PTLF, a poplar (Populus) homolog of LEAFY/FLORICAULA, in transgenic poplar and Arabidopsis. Plant J. 22: 235-246. Immature female inflorescence tissue was collected in mid to late May from wild trees growing in the vicinity of Corvallis, Oregon. Reproductive buds were dissected to remove the young bud scales and the entire inflorescences were collected.
 
Female catkins (M)   Populus trichocarpa. Flushing catkins were collected in early spring (around March 1) from wild trees growing in the vicinity of Corvallis, Oregon.
 
Male catkins (V)   Populus trichocarpa. Flushing catkins were collected in early spring (around March 1) wild trees growing in the vicinity of Corvallis, Oregon
 
Apical shoot (K)   Populus tremula x tremuloides T89. 150 apical shoots (top 3 mm, biggest leaf ca. 5 mm, weight ca 4 mg) from 3-month-old greenhouse-grown plants were collected and pooled.
 
Shoot meristem (T)   Populus tremula x tremuloides T89. Shoot apexes were dissected under microscope .
 
Bark (N)   Populus tremula x tremuloides T89. Long-day treated plants (about 3 m). Bark was sampled from under the "crown" and 75 cm downwards. The sample was peeled off with a "potato peeler", buds were avoided and the cells were inspected in the microscope.
 
Roots (R)   Populus tremula x tremuloides T89. Plants grown in agar under sterile conditions. The whole root system (primary roots) up to 0.5-1 cm from the stem was used. Roots were still white.
 
Imbibed seeds (S)   Populus tremula. Seeds from a seed lot were imbibed and samples were taken 1) right after imbibition and 2) after 24 hours and pooled.
 

eXHeatmap
This tool is generates a heatmap plot, useful for clustering and for analyzing the expression of genes relative to each other. The network analysis tool (Popnet) is a useful alternative to clustering, while the expression plotting tool (exPlot) can be a useful alternative for plotting expression profiles. This tool uses the current gene list and sample list available in the Master Menu, so if those lists are empty, users must first fill them up from a set of dedicated tools.

Clustering with the heatmap
The genes are clustered based on the choice of a distance function and the result of the clustering is shown by means of a dendogram, that can be places on either of x and y axes. The color scale indicates how far the actual expression values are from the local consensus. Distance functions are quantifying how similar is the expression of two genes/samples. For more accurate estimators of gene expression similarity use the PopNet tool. Based on the all-pair distance estimations the genes are clustered together using a chosen variety of the hierarchical clustering algorithm. The sample information is selectable from the command panel. By clicking on the heatmap itself you will open a publishing-ready pdf, or you can export the heatmap data from the command panel and import it into your favorite plotting program.

Best tips to try before you contact us!
We have found that many apparent problems with tools in PopGenIE can result from previous results that have been cached. Before reporting a bug/problem we would request that you first clear your browser cache, quit the browser, again clear the cache when you re-open the browser and then finally check that the problems remains.

How can I find PopGenIE old versions?
http://v2.popgenie.org
http://v1.popgenie.org

How can I convert Populus trichocarpa version 1 gene ids into version 2 gene ids?
Please go to http://v2.popgenie.org/flashbulktools and choose the ID conversion option. Then select the Populus Genome from dropd down menu. Finally paste your old gene ids and click GO button.

How can I convert Populus trichocarpa version 2 gene ids into version 3 gene ids?
Please go to Gene Search tool and paste your old gene ids.

Login | Site Map | © 2021 PlantGenIE.org.| All our tools are under MIT License


  • GeneList
      view active genelist () here.
      genelist namegenesrenamedelete
      add empty genelist / save current list / cancel
  • Analysis
  • <