UofS Pulse Binfo Germplasm Documentation

Note

This package of modules enhances the germplasm support in Tripal Core with support for both Breeding Programs and Germplasm Genebanks.

Installation

Note

It is recommended to clear cashes regularly in this installation processes.

Download Package

The package is available as one repository for Pulse Bioinformatics, University of Saskatchewan on GitHub. Recommended method of downloading and installation is using git:

cd [your drupal root]/sites/all/modules

git clone https://github.com/UofS-Pulse-Binfo/kp_germplasm.git

Enable Package

The module can be enabled in “Home » Administration » Tripal » Modules” by select the checkbox under “ENABLED” column (as shown in above image) and then click on “Save Configuration” button by the bottom of page.

_images/install.1.tripal_module_page.png

Another method that can enable our module is using drush:

drush pm-enable kp_germplasm
drush pm-enable rilsummary germpcollection

Note

In this step, module required ontologies and controlled vocabularies will be inserted into Chado. Make sure to run any Tripal jobs created by these modules before continuing.

Set Permissions

By default, permission of using both importers in this module is not set. It can be configured in “Home » Administration » People » Permissions”.

_images/install.4.permission.png

Import Data

After the module is installed and enabled, both Germplasm Cross Importer and Germplasm Accession Importer should be ready to use in “Home » Administration » Tripal » Data Loader”.

_images/install.2.cross_importer.png _images/install.3.accession_importer.png

For more information on the importers, See the Data Import section of these docs.

Note

The importers add data to from your file into Chado. You then need to publish that data by going to Admin > Content > Tripal Content > Publish and selecting either “F1” for crosses or “Germplasm Accessions”.

Upgrade path from Separate Modules

This package includes modules which used to stand alone (germ_summary, tripal_germplasm_importer, germcollection). To upgrade,

  1. Take note of any existing configuration both in Tripal > Extensions and Structure > Tripal Content Types.
  2. Disable and uninstall the existing modules. This will not delete any data in chado; however, you will need to re-configure the functionality.
  3. Remove the old module directories.
  4. Clone this package and re-install the modules
  5. Re-apply the configuration you took note of above.

Warning

You may need to re-configure after upgrading to this package so take careful note of your original configuration.

Note

The functionality from separate modules will still be available in this package and any new functionality will be developed here. Additionally, germ_summary has been renamed to rilsummary to reflect it’s focus on RILs.

Data Import

Import for Germplasm Accessions and Breeding Crosses are currently supported by this module. The accession import supports BrAPI-compliant metadata in a simple table-based format. For more information, see the following full descriptions of each importer.

Germplasm Cross Importer

Prepare a Germplasm Cross File

Germplasm Cross Importer allows bulk load germplasm crosses into a database. Germplasm cross file needs to follow a specific templet to be able to upload. The following columns must include:

  1. Year: the year the cross was made in
  2. Season: the season the cross was made in
2.1 Make sure to have the full name of a season
  1. Cross Number: a unique identifier for the cross (e.g. 6673S)
3.1 Cross number may already exist in database, double check to make sure the cross number matches exactly the stocked name in database
  1. Maternal Parent: the name of the maternal parent of this cross
4.1 Name of the maternal parent may already exist in database, double check to make sure the cross number matches exactly the stocked name in database
  1. Paternal Parent: the name of the paternal parent of this cross
5.1 Name of the paternal parent may already exist in database, double check to make sure the cross number matches exactly the stocked name in database
  1. Cross Type: the type of cross (e.g. Single Cross, Back Cross)
6.1 Cross type information may be able to find from Cross number. A capitalized letter tends to appear within a cross number, which indicates the Cross type. “S” stands for single cross, “M” stands for multiple cross, “T” stands for triple cross, and “B” stands for back cross. The letter may also be found in low case or missing.
  1. Seed Type: either the market class or seed coat colour of the progeny
  2. Cotyledon Colour: the cotyledon colour of the seed resulting from the cross
  3. Comment: a free-text comment about the cross

Add more columns as needed (e.g. Seed coat, Male Cotyledon Color, Female Cotyledon Color).

Prefix and Organism

Organism must be selected from dropdown menu before upload. Prefix text box is optional to fill in and default the value is ‘GERM’. The uniquename for each germplasm will be ‘GERM’ followed by it’s stock id but user can give a unique prefix to replace ‘GERM’.

_images/cross.2.prefix_organism.png

Bulk load germplasm crosses

As Chado is the data store for Tripal, germplasm corsses will be saved in five chado tables: cv, cvterm, stock, stockprop, and stock_relationship in this module.

  • required control vocabularies (CVs) and CV terms will be checked before data loading
  • germplasm crosses will be loaded into table stock
  • properties for each germplasm will be loaded into table stockprop
  • relationships with parents for each germplasm will be loaded into table stock_relationship

Germplasm Accession Importer

File Upload

Format requirements for upload files can be found easily in UI while using this module.

_images/accession.1.file_format.png

Note

For column 2 (External Database), the name in file must exist in your database in chado table:db already.

For column 12 (Pedigree), pedigree information is recommended to save in format of maternal-parent-name/paternal-parent-name.

For column 13 (Synonyms), multiple synonyms are allowed but must be separated by semi-colons (e.g. syn1;syn2;syn3).

Select Genus

Genus of the accessions in file must be selected from dropdown menu before upload. All accessions in one file must belong to same genus and match this selection.

_images/accession.2.genus.png

Bulk load germplasm accessions

As Chado is the data store for Tripal, germplasm accessions will be saved in several chado tables: cv, cvterm, stock, stockprop, db, dbxref, synonym and stock_synonym for this importer.

The general idea of how accession information will be saved in database:

  • required control vocabularies (CVs) and CV terms will be checked before data loading
  • organism of one accession is determined by germplasm genus, species and subtaxa(optional)
  • germplasm accession will be loaded into table stock, dexref, and db
  • properties will be loaded into table stockprop
  • synonyms will be loaded into table synonym and stock_synonym

A diagram:

_images/accession.3.ER_diagram.jpg

Note

This module uses a specific set of controlled vocabulary terms to identify metadata. Where possible we have used community standard ontologies but in some cases the terms needed were not available. All terms used are compatible with standardized Tripal Content Types.

Germplasm Collection

Provides functionality for supporting germplasm collection (e.g. diversity panels and recombinant inbred lines) including the following:

  • Germplasm Collection content type is created automatically.
  • Fields:
    • Germplasm List (co_010__germplasm): Listing of all germplasm in a collection.
    • Project-Related Germplasm Collection (local__project_germcollection): to link germplasm collections with projects.

Functionality

Germplasm Collection Pages

This module creates a germplasm collection content type which you can configure at Admin > Structure > Tripal Content Types > Germplasm Collection. By default, there will be a name, identifies, type of collection and germplasm list. The germplasm list will show all chado.stock records linked to a given chado.stockcollection record via the chado.stockcollection_stock table in an ajax-paged list.

_images/germplasm-collection.1.png

Warning

There is currently no way to link a Germplasm collection with germplasm (chado.stock) through the user interface. Instead, you will need to add records to chado.stockcollection_stock manually to create the link.

Recombinant Inbred Line (RIL) Summary

Provides functionality for summarizing Recombinant Inbred Lines (RILs) including the following:

  1. Tabular matrix which summarizes how many RILs are available for each species combination. This is particularly helpful if you have a cultivated and associated wild species for a single genus.
  2. Listing of all RILs for a specific species combination including information about the number of F2 families for each F-generation.
  3. ChadoField for RIL pages which summarizes information about the number of F2 families for each F-generation.

Functionality

The RIL summary matrix can be found at [mytripalsite.com]/germplasm/summary/[genus]. This is what it looks like for a fake Tripalus example.

_images/rilsummary.png

When you click on any of the cells in the RIL summary matrix you are taken to the following listing:

_images/rillist.png

The details for a given RIL can be summarized on the RIL Tripal Content Page using the field provided by this module.

_images/rilfield.png

Adding RILs to the summary

  1. Create a Recombinant Inbred Line with the name of your RIL population (e.g. TR-01).
  2. Create a germplasm line (type doesn’t matter; suggested Generated Germplasm (Breeding Line)) with the name of the original cross giving rise to the RIL population (e.g. 1234S) and add a relationship: TR-01 is_selection_of 1234S.
  3. Create parents for the Breeding cross (type does not matter) and related them using the is_maternal_parent and is_paternal_parent relationship types (e.g. CDC FRED is_maternal_parent_of 1234S and AABC is_paternal_parent_of 1234S).
  4. Each subline for a RIL (i.e. TR-01-123) should be of type stock_type:F2.

Adding the summary to RIL pages.

  1. Go to Admin > Structure > Tripal Content Types > Recombinant Inbred Lines > Manage Fields.
  2. Add a new field where the type is Germplasm RIL Summary.
  3. Make sure it is not disabled on the Manage Display tab.