Metagenomics of Antarctic Lakes: a Model for Defining Microbial Biogeochemical Processes in the Cold

Federal Geographic Data Committee (FGDC) Metadata:


Identification_Information:
Citation:
Citation_Information:
Originator: Unknown
Publication_Date: Unknown
Title: Metagenomics of Antarctic Lakes: a Model for Defining Microbial Biogeochemical Processes in the Cold
Description:
Abstract:
Metadata record for data from ASAC Project 2899 See the link below for public details on this project. We conducted a genomic analysis of Archaea and Bacteria collected from lakes in the Vestfold Hills, Antarctica. This provided a new level of understanding about the life forms inhabiting these cold lakes. Linked to knowledge of meteorological, geological, chemical and physical data that has been collected over years of previous research, the new genomic data will generate a complete understanding of how the microorganisms have evolved and how they have transformed and presently interact with the Antarctic environment. Deriving an integrated understanding of microbial ecology is essential for determining ways of preserving the health of the World's ecosystems. The data are available for download as an excel spreadsheet and a word document from the URL given below. The GPS coordinates where samples were collected from are as follows: (Note these are UTM (Universal Transverse Mercator) coordinates, from zone 44D) Ace Lake: 44D 0384881 (easting), 2401821 (northing) Deep Lake: 44D 0385351, 2391772 Organic Lake: 44D 0384928, 2403550 The fields in this dataset are: Water temperature - degrees Celsius Specific conductivity - micro Seimens per centimetre Conductivity - micro Seimens per centimetre Salinity - parts per trillion Dissolved oxygen % - % Dissolved oxygen concentration - milligrams per litre Dissolved oxygen charge - This is an engineering value. The value is unit less, the recommended reading is 50 plus or minus 25. If you have a low reading it generally means you need to replace the membrane and if you have a high reading you need to recondition the probe. PressureA (This a depth reading of the Sonde) - (pounds-force per square inch absolute) Water depth - metres pH pHmV (This is the pH millivolt reading that the probe is outputting the Sonde) - millivolts Turbidity - (nephelometric turbidity unit) BP (Barometric Air Pressure) - psi (pounds per square inch) Taken from the 2008-2009 Progress Report: Progress against objectives: New lake and ocean samples, including additional opportunistic samples from Heard Island, were obtained Oct-Dec 2008. All samples from 2006 forward are being processed. This includes DNA (metagenomics) and protein (proteomics). A great deal of bioinformatic analyses have been performed on metagenome data. Metaproteomics has also proceeded well. Details of some of the progress are as follows: In the reporting period 1,064,488 Sanger sequencing reads were produced with 967,410 passing quality control, which at an average of 700bp provided 677Mb of sequence data. The reads were produced in batches for each sample. We generated assembly statistics and phylogenetic profiles after the completion of each batch. Sample diversity then guided the sequence allocation for each sample. A number of pragmatic software tools have been created to perform the analyses. As an example, for one sample the whole sample assembly was characterised by read depth, GC content, di-nucleotide frequency (Tetra) and tri-nucleotide frequency (Tetra) on a per scaffold basis. The intrinsic properties then formed vectors in a feature space on which a self-organising map clustering analysis was performed. The cluster which comprised the most abundant species was isolated and the genes annotated. This represented 9 contigs with a total of 1.7Mb and 1683 predicted genes. For this sample, proteins were extracted and metaproteomics performed resulting in a total of 3970 confident peptides matched providing identities for 504 proteins (at least 2 peptide matches per protein) representing about 30% coverage. In comparison, a total of 170 proteins were identified against the non-redundant database. In other metaproteomic analyses, samples from 4 lake depths provided a total of 7,925 peptides providing the identification of 1015 proteins against the NCBI non-redundant protein database (matches not yet performed to annotated metagenome data). For testing detection limits and accuracy of identifications using a metaprotomics approach, a simulated mixed community study was performed using S. alaskensis and E. coli. This has shown that cell numbers, protein abundance and cell volumes all impact the ability to detect proteins of individual microorganisms within a population. The type and size of the database the metaproteomic dataset is searched against (non-redundant versus S. alaskensis + E. coli protein database) also resulted in differences in protein detection. The work has been useful for optimising parameters used for metaproteomics of the Antarctic samples. An interesting eukaryotic virus that dominates the biomass of one of the samples is being analysed with the present work focusing on classifying and characterising. Transmission electron microscopy of the water sample revealed virus-like particles of approximately 150nm but it was unclear from morphology if they represented a single virus type or several. Two complementary metagenomic assembly approaches are being used to produce the most complete assembly possible of the large viral sequences. The first assembly strategy follows a conventional metagenomic workflow consisting of assembly of the whole metagenomic dataset followed by taxonomic binning of the constructs. An initial assembly has been constructed after determining the optimum acceptable degree of error. A high degree of assembly was evident with the largest scaffold spanning 108kb with 6 X coverage. A BLASTx search of the five largest contigs (greater than 10kb) produced two alignments to Major Capsid Protein (MCP) genes; one to the short MCP gene of Chyrsochromulina ericina virus (28% identity) and the other to the full MCP gene of Phaeocytis pouchetii virus (76% identity). Sequence flanking the full MCP gene corresponds to conserved hypothetical protein sequences from Ostreococcus virus 5 (45% identity) and Paramecium sp. Chlorella virus AR158 (39% identity). These large deeply assembling contigs will be used to 'tune' the parameters to improve assembly of the entire metagenome. A preliminary attempt to bin the scaffolds using tetra nucleotide frequencies from the initial assembly has not completely resolved into clear taxonomic clusters. A multi-dimensional binning approach including sequence coverage, GC content, nucleotide frequencies along with identification of marker genes is being developed and will be applied once an optimum whole metagenomic assembly has been completed. Although the presence of conserved genes is a promising sign of accurate assembly, validation of the scaffolds by comparison to sequenced virus genomes is uninformative as viruses are poorly represented in the public databases and extremely diverse. Instead, a second assembly strategy is underway that will conservatively extract and compile the viral sequence. The reads assigned in an initial MEGAN analysis to the large dsDNA viral clade were used in a preliminary round of assembly. This first assembly will be used as a reference to recruit more overlapping fragments and combined in another round assembly extending the construct from the high confidence 'seeds'. Cycles of recruitment and assembly will continue until the assembly reaches an end point. This is a new method of assembly that potentially can be used to extract and produce confident assemblies of other species with no sequenced representatives. Comparison between this virus specific assembly and the conventional metagenomic assembly will allow evaluation of the fidelity of both processes.
Purpose:
Not Available
Supplemental_Information:
REFERENCE: Allen, M., Lauro, F.M., Williams, T.J., Burg, D., Siddiqui, K.S., De Francisci, D., Chong, K.W.Y., Pilak, O., Chew, H.H., De Maere, M.Z., Ting, L., Katrib, M., Ng, C., Sowers, K.R., Galperin, M.Y., Anderson, I.J., Ivanova, N., Dalin, E., Martinez, M., Lapidus, A., Hauser, L., Land, M., Thomas, T. and Cavicchioli, R. 2009 The genome sequence of the psychrophilic archaeon, Methanococcoides burtonii: the role of genome evolution in cold-adaptation. International Society of Microbial Ecology Journal in press
Supplemental_Information:
REFERENCE: Cavicchioli, R. and Lauro, F. 2009 Effects of climate change on polar microbes. Microbiology Australia 30 2
Time_Period_of_Content:
Time_Period_Information:
Range_of_Dates/Times:
Beginning_Date: 20061001
Ending_Date: 20090331
Currentness_Reference:Unknown
Status:
Progress: Complete
Maintenance_and_Update_Frequency: As needed
Spatial_Domain:
Description_of_Geographic_Extent:
Bounding_Coordinates:
West_Bounding_Coordinate: 77.0
East_Bounding_Coordinate: 79.0
North_Bounding_Coordinate: -67.0
South_Bounding_Coordinate: -68.0
Spatial_Domain:
Description_of_Geographic_Extent:
Bounding_Coordinates:
West_Bounding_Coordinate: 73.51
East_Bounding_Coordinate: 73.59
North_Bounding_Coordinate: -53.11
South_Bounding_Coordinate: -53.19
Keywords:
Theme:
Theme_Keyword_Thesaurus: GCMD SCIENCE PARAMETERS
Theme_Keyword_Thesaurus: GCMD PLATFORM
Theme_Keyword_Thesaurus: ANCILLARY KEYWORDS
Theme_Keyword_Thesaurus: ISO TOPIC CATEGORY
Theme_Keyword_Thesaurus: DATA SET LANGUAGE
Theme_Keyword: EARTH SCIENCE > TERRESTRIAL HYDROSPHERE > SURFACE WATER > LAKES
Theme_Keyword: EARTH SCIENCE > BIOLOGICAL CLASSIFICATION > BACTERIA/ARCHAEA
Theme_Keyword: EARTH SCIENCE > BIOSPHERE > AQUATIC ECOSYSTEMS > LAKES
Theme_Keyword: FIELD SURVEYS
Theme_Keyword: Metagenomic
Theme_Keyword: Lakes
Theme_Keyword: Vestfold Hills
Theme_Keyword: Archaea
Theme_Keyword: Bacteria
Theme_Keyword: BIOTA
Theme_Keyword: INLAND WATERS
Theme_Keyword: ENGLISH
Place:
Place_Keyword_Thesaurus: GCMD
Place_Keyword: CONTINENT > ANTARCTICA > VESTFOLD HILLS
Place_Keyword: GEOGRAPHIC REGION > POLAR
Place_Keyword: OCEAN > SOUTHERN OCEAN > HEARD AND MCDONALD ISLANDS
Access_Constraints: These data are not yet publicly available.
Use_Constraints:
This data set conforms to the PICCCBY Attribution License (http://creativecommons.org/licenses/by/3.0/). Please follow instructions listed in the citation reference provided at http://data.aad.gov.au/aadc/metadata/citation.cfm?entry_id=ASAC_2899 when using these data.
Point_of_Contact:
Contact_Information:
Contact_Person_Primary:
Contact_Person: RICK CAVICCHIOLI
Contact_Position: INVESTIGATOR
Contact_Position: TECHNICAL CONTACT
Contact_Position: DIF AUTHOR
Contact_Address:
Address_Type: Mailing and Physical Address
Address: School Of Microbiology and Immunology
Address: University Of New South Wales
City: Sydney
State_or_Province: NSW
Postal_Code: 2052
Country: Australia
Contact_Voice_Telephone: +61 2 9385 3516
Contact_Electronic_Mail_Address: r.cavicchioli@unsw.edu.au
Back to Top
Data_Quality_Information:
Attribute_Accuracy:
Attribute_Accuracy_Report:
The data from Ace Lake were collected using a YSI Sonde 6600 Multifunction Probe the 20th of December, 2006. Biomass collection was collected between December 20-25. In addition to Ace Lake, surface water samples from Organic and Deep lakes for filtering were also collected. The biomass samples were collected using a Millipore Filter System with three 293mm Filter Disk Holders with sequential 3.0, 0.8 and 0.1 micrometer filters. For concentrated filtrate samples we used a Millipore Pellicon-2 Tangential Flow Filtration System with a Pellicon II Ultrafiltration Biomax 50kD Cassette. The sediment was sampled with an Eckman sediment grab. Taken from the 2008-2009 Progress Report: Variations to work plan or objectives: Samples were opportunistically obtained from Heard Island. This will broaden the scope of the project in a positive way. Difficulties affecting project: The logistics of the season were challenging - both in terms of the impact of the medical incident at Davis in October, and the unusually overcast and windy weather experienced at Davis during our field period (5 periods with wind exceeding 100 km/h). Despite the challenges our field work was remarkably successful.
Logical_Consistency_Report:
Not Available
Completeness_Report:
Not Available
Lineage:
Process_Step:
Process_Description:
Not Available
Process_Date: Unknown
Back to Top
Spatial_Reference_Information:
Back to Top
Distribution_Information:
Distributor:
Contact_Information:
Contact_Organization_Primary:
Contact_Organization: AU/AADC > Australian Antarctic Data Centre, Australia
Contact_Person: DATA OFFICER AADC
Contact_Position: DATA CENTER CONTACT
Contact_Address:
Address_Type: Mailing and Physical Address
Address: Australian Antarctic Division
Address: 203 Channel Highway
City: Kingston
State_or_Province: Tasmania
Postal_Code: 7050
Country: Australia
Contact_Voice_Telephone: +61 3 6232 3244
Contact_Facsimile_Telephone: +61 3 6232 3351
Contact_Electronic_Mail_Address: metadata@aad.gov.au
Resource_Description: ASAC_2899
Distribution_Liability:
Not Available
Standard_Order_Process:
Digital_Form:
Digital_Transfer_Information:
Format_Name: Excel, Word
Transfer_Size: 65 kb
Digital_Transfer_Option:
Online_Option:
Computer_Contact_Information:
Network_Address:
Network_Resource_Name:
http://data.aad.gov.au
Access_Instructions:
DATA CENTER URL
Digital_Transfer_Option:
Online_Option:
Computer_Contact_Information:
Network_Address:
Network_Resource_Name:
http://data.aad.gov.au/aadc/portal/download_file.c...
Access_Instructions:
Download point for the dataset.
Digital_Transfer_Option:
Online_Option:
Computer_Contact_Information:
Network_Address:
Network_Resource_Name:
https://secure3.aad.gov.au/proms/public/projects/r...
Access_Instructions:
Public information for ASAC project 2899
Digital_Transfer_Option:
Online_Option:
Computer_Contact_Information:
Network_Address:
Network_Resource_Name:
http://data.aad.gov.au/aadc/metadata/citation.cfm?...
Access_Instructions:
Citation reference for this metadata record and dataset
Fees: Free
Back to Top
Metadata_Reference_Information:
Metadata_Date: 20070425
Metadata_Review_Date: 20140829
Metadata_Contact:
Contact_Information:
Contact_Person_Primary:
Contact_Person: RICK CAVICCHIOLI
Contact_Position: INVESTIGATOR
Contact_Position: TECHNICAL CONTACT
Contact_Position: DIF AUTHOR
Contact_Address:
Address_Type: Mailing and Physical Address
Address: School Of Microbiology and Immunology
Address: University Of New South Wales
City: Sydney
State_or_Province: NSW
Postal_Code: 2052
Country: Australia
Contact_Voice_Telephone: +61 2 9385 3516
Contact_Electronic_Mail_Address: r.cavicchioli@unsw.edu.au
Metadata_Standard_Name: FGDC Content Standards for Digital Geospatial Metadata
Metadata_Standard_Version: FGDC-STD-001-1998
Metadata_Time_Convention: local time
Back to Top
[ Update this Record ]


Link to Web Site