Schema for TOGA vs. hg38 - TOGA annotations using human/hg38 as reference
Database: mm10 Primary Table: HLTOGAannotvHg38v1 Data last updated: 2022-06-20
Big Bed File Download: /gbdb/mm10/TOGAvHg38v1/HLTOGAannotVsHg38v1.bb Item Count: 54,570 The data is stored in the binary BigBed format.
Format description: TOGA predicted gene model
field example description
chrom chr1 Reference sequence chromosome or scaffold chromStart 130391493 Start position in chromosome chromEnd 130462669 End position in chromosome name ENST00000367064.CD55.8 Name or ID of item, ideally both human readable and unique score 1000 Score (0-1000) strand - + or - for strand thickStart 130391493 Start of where display should be thick (start codon) thickEnd 130462669 End of where display should be thick (stop codon) itemRgb 255,160,120 RGB value (use R,G,B string in input file) blockCount 9 Number of blocks blockSizes 68,81,171,192,86,100,192,186,100, Comma separated list of block sizes chromStarts 0,56834,57593,60890,65327,66593,68088,70328,71076, Start positions relative to chromStart ref_trans_id ENST00000367064.CD55 Reference transcript ID ref_region chr1:207321677-207360966 Transcript region in the reference query_region chr1:130391493-130462669 Region in the query chain_score 0.9961345195770264 Chain orthology probability score chain_synteny 379 Chain synteny log10 value chain_flank 0.1631 Chain flank feature chain_gl_cds_fract 0.033950094004687945 Chain global CDS fraction value chain_loc_cds_fract 0.1017703132092601 Chain local CDS fraction value chain_exon_cov 0.9781849912739965 Chain exon coverage value chain_intron_cov 0.2695986266655767 Chain intron coverage value status Uncertain Loss Gene loss classification perc_intact_ign_M 0.8350785340314136 % intact ignoring missing perc_intact_int_M 0.8350785340314136 % intact considering missing as intact intact_codon_prop 0.9607329842931938 % intact codons ouf_prop 0.0 % out of chain mid_intact 0 Is middle 80% intact mid_pres 1 Is middle 80% fully present prot_alignment ref: M-----TVARPSVPAALPLLGELPRLLLLVLLCL-PAVWGDCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPG | ||| | | || || | || | | | |||| ||| ||| | | | | | | | | que: MIRGRAPRTRPSPPPPL-----LP-LLSLSLLLLSPTVRGDCGPPPDIPNARPILGRHSKFAEQSKVAYSCNNGFKQVPD ref: EKDSVICLKGSQWSDIEEFCNRSCEVPTRLNSASLKQPYITQNYFPVGTVVEYECRPGYRREPSLSPKLTCLQNLKWSTA | || ||| | || || | || |||| | | ||||| |||||||| | | | | ||| | || que: KSNIVVCLENGQWSSHETFCEKSCVAPERLSFASLKKEYLNMNFFPVGTIVEYECRPGFRKQPPLPGKATCLEDLVWSPV ref: VEFCKKKSCPNPGEIRNGQIDVPGGILFGATISFSCNTGYKLFGSTSSFCLISGSSVQWSDPLPECREIYCPAPPQIDNG |||||||||| || | | ||||| | |||| || | | | || | | | | | | || || || | || que: AQFCKKKSCPNPKDLDNGHINIPTGILFGSEINFSCNPGYRLVGVSSTFCSVTGNTVDWDDEFPVCTEIHCPEPPKINNG ref: IIQGERDHYGYRQSVTYACNKGFTMIGEHSIYCTVNN-DEGEWSGPPPECRG-----------KSLTSKVPPTVQKPTTV | || | | | | ||| | ||| | |||||| | | || ||| | ||| || | que: IMRGESDSYTYSQVVTYSCDKGFILVGNASIYCTVSKSDVGQWSSPPPRCIAAPPKSQKPTKANNPSTAAPPTPQKTNTA ref: NVPTTEVSPTSQKTTTK----TTTPNAQATRSTPVSRTT-KHFHETTPNKGSGTTSG---TTRLL--SGHTCF-TLTGLL || | || ||| | | || | | || || | | || | | | || ||||| ||| | que: DVPAAEIPPTPQKTNTADVPATETPTSQTTQHVPVTKTTVRHPIRTSTDKGEPNT-GPGASTQLLTLSGHTCLITLTVLH ref: GTLVTMGLLT* | | ||| que: AMLSLIGYLT* HTML-formatted protein alignment svg_line none SP ENST00000367064.CD55.8 aa SVG inactivating mutations visualization ref_link ENST00000367064 Reference transcript link inact_mut_html_table 7 1 SSM ('gt', 'gc')->aa YES SSM_1 9 0 Deleted exon - NO DEL_1 HTML-formatted inactivating mutations table exon_ali_html Exon number: 1 Exon region: chr1:130462669-130462569Nucleotide percent identity: 41.53 | BLOSUM: 30.12 Intersects assembly gaps: NOExon alignment class: ADetected within expected region (exp:130462561-130462672): YESSequence alignment between reference and query exon: ref: ATG---------------ACCGTCGCGCGGCCGAGCGTGCCCGCGGCGCTGCCCCTCCTCGGGGAGCTGCCCCGGCTGCT ||| | | ||||| ||| | ||||| ||||| |||| que: ATGATCCGTGGGCGGGCGCCTAGGACTCGGCCATCACCGCCGCCTCCGCTG---------------CTGCCG---TTGCT ref: GCTGCTGGTGCTGTTGTGCCTG---CCGGCCGTGTGGG | |||| |||||| ||| || | || | | que: GTCGCTGTCTCTGTTGCTGCTGTCCCCAACTGTACGCG Exon number: 2 Exon region: chr1:130462007-130461821Nucleotide percent identity: 56.45 | BLOSUM: 48.96 Intersects assembly gaps: NOExon alignment class: ADetected within expected region (exp:130461806-130462025): YESSequence alignment between reference and query exon: ref: GTGACTGTGGCCTTCCCCCAGATGTACCTAATGCCCAGCCAGCTTTGGAAGGCCGTACAAGTTTTCCCGAGGATACTGTA | ||||| |||| || ||||| | ||||||||| |||| |||| | | | | ||| | ||| | | | que: GAGACTGCGGCCCACCTCCAGACATTCCTAATGCCAGGCCAATCTTGGGCAGACACTCCAAGTTTGCTGAGCAAAGCAAA ref: ATAACGTACAAATGTGAAGAAAGCTTTGTGAAAATTCCTGGCGAGAAGGACTCAGTGATCTGCCTTAAGGGCAGTCAATG | | ||| ||| | | ||||| || |||| | | || || ||| |||| ||| | | ||||| que: GTGGCATACTCGTGTAATAACGGCTTTAAACAAGTTCCAGACAAGTCAAACATAGTTGTCTGTCTTGAAAATGGCCAATG ref: GTCAGATATTGAAGAGTTCTGCAATC ||| ||| ||||| | que: GTCGAGCCACGAAACATTCTGTGAGA Exon number: 3 Exon region: chr1:130459773-130459581Nucleotide percent identity: 69.27 | BLOSUM: 64.24 Intersects assembly gaps: NOExon alignment class: A+Detected within expected region (exp:130459571-130459788): YESSequence alignment between reference and query exon: ref: GTAGCTGCGAGGTGCCAACAAGGCTAAATTCTGCATCCCTCAAACAGCCTTATATCACTCAGAATTATTTTCCAGTCGGT || | | ||| ||| || | || ||||||||||||| | || ||| ||||| ||| ||||| ||| que: AATCATGTGTTGCTCCAGAAAGACTGAGTTTTGCATCCCTCAAAAAAGAGTACCTCAACATGAATTTTTTCCCAGTTGGT ref: ACTGTTGTGGAATATGAGTGCCGTCCAGGTTACAGAAGAGAACCTTCTCTATCACCAAAACTAACTTGCCTTCAGAATTT ||| |||||||||||||||| || ||||| | ||| | ||||| | || || |||| |||||||||| || |||| que: ACTATTGTGGAATATGAGTGTCGGCCAGGATTTCGAAAACAACCTCCACTCCCAGGAAAAGCAACTTGCCTTGAGGATTT ref: AAAATGGTCCACAGCAGTCGAATTTTGTAAAA | |||||| ||| | | |||||||||| que: AGTATGGTCTCCAGTTGCTCAGTTTTGTAAAA Exon number: 4 Exon region: chr1:130458186-130458086Nucleotide percent identity: 72.00 | BLOSUM: 67.43 Intersects assembly gaps: NOExon alignment class: ADetected within expected region (exp:130458044-130458198): YESSequence alignment between reference and query exon: ref: AGAAATCATGCCCTAATCCGGGAGAAATACGAAATGGTCAGATTGATGTACCAGGTGGCATATTATTTGGTGCAACCATC | ||||||||||||||||| ||| | |||||||| || | ||||| ||||||||||| ||| || || que: AAAAATCATGCCCTAATCCTAAAGATCTGGATAATGGTCACATCAACATACCAACCGGCATATTATTCGGTTCAGAAATA ref: TCCTTCTCATGTAACACAGG ||||||||| ||| |||| que: AACTTCTCATGCAACCCAGG Exon number: 5 Exon region: chr1:130456906-130456820Nucleotide percent identity: 60.47 | BLOSUM: 52.29 Intersects assembly gaps: NOExon alignment class: ADetected within expected region (exp:130456815-130456912): YESSequence alignment between reference and query exon: ref: GTACAAATTATTTGGCTCGACTTCTAGTTTTTGTCTTATTTCAGGCAGCTCTGTCCAGTGGAGTGACCCGTTGCCAGAGT ||||| || | || | |||| ||| ||| | | |||| | |||| | ||| || ||| |||| || que: GTACAGGCTAGTCGGTGTCTCCTCTACTTTCTGTTCTGTCACAGGAAATACTGTTGATTGGGACGATGAGTTTCCAGTGT ref: GCAGAG ||| || que: GCACAG Exon number: 6 Exon region: chr1:130452575-130452383Nucleotide percent identity: 72.40 | BLOSUM: 62.71 Intersects assembly gaps: NOExon alignment class: ADetected within expected region (exp:130452375-130452584): YESSequence alignment between reference and query exon: ref: AAATTTATTGTCCAGCACCACCACAAATTGACAATGGAATAATTCAAGGGGAACGTGACCATTATGGATATAGACAGTCT |||| ||||||||| |||||| |||| ||||||||||||| | ||||||| ||||| |||| ||||| ||| que: AAATACATTGTCCAGAGCCACCAAAAATCAACAATGGAATAATGCGAGGGGAAAGTGACTCTTATACGTATAGCCAGGTG ref: GTAACGTATGCATGTAATAAAGGATTCACCATGATTGGAGAGCACTCTATTTATTGTACTGTGAATAAT---GATGAAGG || || ||| ||||| | ||||| |||| | || ||||| | |||||||||||||||| || |||| ||| que: GTCACCTATTCATGTGACAAAGGCTTCATCCTGGTTGGAAATGCTAGCATTTATTGTACTGTGAGCAAGTCTGATGTAGG ref: AGAGTGGAGTGGCCCACCACCTGAATGCAGAG | | ||||| | |||||||| |||| || que: ACAATGGAGCAGTCCACCACCCCGGTGCATAG Exon number: 7 Exon region: chr1:130449257-130449086Nucleotide percent identity: 50.88 | BLOSUM: 41.35 Intersects assembly gaps: NOExon alignment class: ADetected within expected region (exp:130449093-130449574): YESSequence alignment between reference and query exon: ref: GA---------------------------------AAATCTCTAACTTCCAAGGTCCCACCAACAGTTCAGAAACCTACC | || || | || | | |||||||||| ||||||| | | | que: CAGCCCCACCAAAATCTCAGAAACCTACCAAAGCAAATAATCCATCTACAGCAGCCCCACCAACACCTCAGAAAACCAAC ref: ACAGTAAATGTTCCAACTACAGAAGTCTCACCAACTTCTCAGAAAACCACCACAAAA------------ACCACCACACC |||| | |||| ||| || | ||| || ||||||| |||||||||||| |||| | || || || que: ACAGCAGATGTCCCAGCTGCCGAAATCCCACCAACACCTCAGAAAACCAACACAGCAGATGTCCCAGCTACAGAAACCCC ref: AAATGCTCAAG || ||||| que: AACATCTCAAA Exon number: 8 Exon region: chr1:130448408-130448327Nucleotide percent identity: 53.57 | BLOSUM: 35.97 Intersects assembly gaps: NOExon alignment class: ADetected within expected region (exp:130448286-130448441): YESSequence alignment between reference and query exon: ref: CAACACGGAGTACACCTGTTTCCAGGACAACC---AAGCATTTTCATGAAACAACCCCAAATAAAGGAAGTGGAACCACT |||| | | | ||||||| ||| |||||| ||| |||| | || | |||||| | ||| que: CAACCCAGCATGTACCTGTTACCAAGACAACAGTACGTCATCCAATAAGAACATCTACAGACAAAGGAGAGCCTAACACA ref: TCAG | que: ---G Exon number: 9 Exon region: chr1:130428802-130428766Nucleotide percent identity: 38.89 | BLOSUM: 51.85 Intersects assembly gaps: NOExon alignment class: ADetected within expected region (exp:130447534-130447641): NOSequence alignment between reference and query exon: ref: GT---------ACTACCCGTCTTCTA------TCTG | | ||| | || || |||| que: GCCCTGGTGCCAGTACACAGCTGCTGACCTTGTCTG Exon number: 10 Exon region: chr1:130391561-130391493Nucleotide percent identity: 67.65 | BLOSUM: 46.67 Intersects assembly gaps: NOExon alignment class: ADetected within expected region (exp:130391474-130391583): YESSequence alignment between reference and query exon: ref: GGCACACGTGTTTC---ACGTTGACAGGTTTGCTTGGGACGCTAGTAACCATGGGCTTGCTGACTTAG | || || ||||| || ||||||| ||||| || || |||| | || |||| |||| ||| que: GACATACATGTTTAATAACCTTGACAGTTTTGCATGCGATGCTATCACTTATTGGCTACTTGACATAG HTML-formatted exon alignment
Sample Rows
chrom chromStart chromEnd name score strand thickStart thickEnd itemRgb blockCount blockSizes chromStarts ref_trans_id ref_region query_region chain_score chain_synteny chain_flank chain_gl_cds_fract chain_loc_cds_fract chain_exon_cov chain_intron_cov status perc_intact_ign_M perc_intact_int_M intact_codon_prop ouf_prop mid_intact mid_pres prot_alignment svg_line ref_link inact_mut_html_table exon_ali_html
chr1 130391493 130462669 ENST00000367064.CD55.8 1000 - 130391493 130462669 255,160,120 9 68,81,171,192,86,100,192,186,100, 0,56834,57593,60890,65327,66593,68088,70328,71076, ENST00000367064.CD55 chr1:207321677-207360966 chr1:130391493-130462669 0.9961345195770264 379 0.1631 0.033950094004687945 0.1017703132092601 0.9781849912739965 0.2695986266655767 Uncertain Loss 0.8350785340314136 0.8350785340314136 0.9607329842931938 0.0 0 1 ref: M-----TVARPSVPAALPLLGELPRLLLLVLLCL-PAVWGDCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPG ... ... ENST00000367064 71SSM('gt', 'gc')->aaYESSSM_190Deleted ... Exon number: 1Exon region: chr1:130462669-130462569Nucleotide percent identity: 41.53 | BL ...
chr1 130391493 130462669 ENST00000367067.CD55.8 1000 - 130391493 130462669 0,0,200 9 68,475,278,192,86,100,192,186,100, 0,56834,57645,60890,65327,66593,68088,70328,71076, ENST00000367067.CD55 chr1:207321732-207359767 chr1:130391493-130462669 0.9963686466217041 379 0.18375 0.033950094004687945 0.14616432137993646 0.9739866908650938 0.2598640583554377 Intact 0.9655172413793104 0.9655172413793104 0.9655172413793104 0.0 1 1 ref: M-----TVARPSVPAALPLLGELPRLLLLVLLCL-PAVWGDCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPG ... ... ENST00000367067 71SSM('gt', 'gc')->caNOSSM_180SSM ... Exon number: 1Exon region: chr1:130462669-130462569Nucleotide percent identity: 41.53 | BL ...
chr1 130391493 130462669 ENST00000391921.CD55.8 1000 - 130391493 130462669 255,160,120 8 68,81,171,192,86,100,186,100, 0,56834,57593,60890,65327,66593,70328,71076, ENST00000391921.CD55 chr1:207321642-207359713 chr1:130391493-130462669 0.9963686466217041 379 0.1839 0.033950094004687945 0.0843395369950068 0.9737945492662474 0.27340001626412946 Uncertain Loss 0.8018867924528302 0.8018867924528302 0.9528301886792453 0.0 0 1 ref: M-----TVARPSVPAALPLLGELPRLLLLVLLCL-PAVWGDCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPG ... ... ENST00000391921 61SSM('gt', 'gc')->aaYESSSM_180Deleted ... Exon number: 1Exon region: chr1:130462669-130462569Nucleotide percent identity: 41.53 | BL ...
chr1 130416838 130418207 ENST00000644836.CD55.254876 1000 - 130416838 130418207 130,130,130 2 86,100, 0,1269, ENST00000644836.CD55 chr1:207321747-207359876 chr1:130416838-130418207 0.6891490817070007 1 0.0 0.10508474576271186 0.10508474576271186 0.16103896103896104 0.043271594820521224 Partial missing 0.16103896103896104 1.0 1.0 0.8389610389610389 1 0 ref: MTVAR-PSVPAALPLLGELPRLLLLVLLCLPAVWGDCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDSV ... ... ENST00000644836 10Missing exon-YESMIS_120Missing exon ... Exon number: 1Exon region: chr1:130422958-130422864Nucleotide percent identity: 53.40 | BL ...
chr1 130419600 130419792 ENST00000314754.CD55.330981 1000 - 130419600 130419792 130,130,130 1 192, 0, ENST00000314754.CD55 chr1:207321700-207360501 chr1:130419600-130419792 0.639787495136261 1 0.0 0.1453444360333081 0.1453444360333081 0.14512471655328799 0.030863016319947513 Partial missing 0.14285714285714285 0.782312925170068 0.984375 0.854875283446712 0 0 ref: MTVARPSVPAALPLLGELPRLLLLVLLCLPAVWG--DCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDS ... ... ENST00000314754 10Missing exon-YESMIS_120Missing exon ... Exon number: 1Exon region: chr1:130421335-130421298Nucleotide percent identity: 18.00 | BL ...
chr1 130419600 130419792 ENST00000367063.CD55.330981 1000 - 130419600 130419792 130,130,130 1 192, 0, ENST00000367063.CD55 chr1:207321531-207340766 chr1:130419600-130419792 0.6739023327827454 1 0.0 0.1453444360333081 0.1453444360333081 0.14382022471910114 0.06435248518011856 Partial missing 0.14157303370786517 0.7842696629213484 0.984375 0.8561797752808988 0 0 ref: MTVARPSVPAALPLLGELPRLLLLVLLCLPAVWG--DCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDS ... ... ENST00000367063 10Missing exon-YESMIS_120Missing exon ... Exon number: 1Exon region: chr1:130421335-130421298Nucleotide percent identity: 18.00 | BL ...
chr1 130419600 130419792 ENST00000367064.CD55.330981 1000 - 130419600 130419792 130,130,130 1 192, 0, ENST00000367064.CD55 chr1:207321677-207360966 chr1:130419600-130419792 0.639787495136261 1 0.0 0.1453444360333081 0.1453444360333081 0.16753926701570682 0.030763781029455844 Partial missing 0.1649214659685864 0.7486910994764397 0.984375 0.8324607329842932 0 0 ref: MTVARPSVPAALPLLGELPRLLLLVLLCLPAVWG--DCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDS ... ... ENST00000367064 10Missing exon-YESMIS_120Missing exon ... Exon number: 1Exon region: chr1:130421335-130421298Nucleotide percent identity: 18.00 | BL ...
chr1 130419600 130419792 ENST00000367067.CD55.330981 1000 - 130419600 130419792 130,130,130 1 192, 0, ENST00000367067.CD55 chr1:207321732-207359767 chr1:130419600-130419792 0.639787495136261 1 0.0 0.1453444360333081 0.1453444360333081 0.1161524500907441 0.0311947391688771 Partial missing 0.11433756805807622 0.8257713248638838 0.984375 0.8838475499092558 0 0 ref: MTVARPSVPAALPLLGELPRLLLLVLLCLPAVWG--DCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDS ... ... ENST00000367067 10Missing exon-YESMIS_120Missing exon ... Exon number: 1Exon region: chr1:130421335-130421298Nucleotide percent identity: 18.00 | BL ...
chr1 130419600 130419792 ENST00000644836.CD55.330981 1000 - 130419600 130419792 130,130,130 1 192, 0, ENST00000644836.CD55 chr1:207321747-207359876 chr1:130419600-130419792 0.639787495136261 1 0.0 0.1453444360333081 0.1453444360333081 0.16623376623376623 0.030841938480030594 Partial missing 0.16363636363636364 0.7506493506493507 0.984375 0.8337662337662337 0 0 ref: MTVARPSVPAALPLLGELPRLLLLVLLCLPAVWG--DCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDS ... ... ENST00000644836 10Missing exon-YESMIS_120Missing exon ... Exon number: 1Exon region: chr1:130421335-130421298Nucleotide percent identity: 18.00 | BL ...
chr1 130419600 130419792 ENST00000645323.CD55.330981 1000 - 130419600 130419792 130,130,130 1 192, 0, ENST00000645323.CD55 chr1:207321642-207360336 chr1:130419600-130419792 0.639787495136261 1 0.0 0.1453444360333081 0.1453444360333081 0.14545454545454545 0.03086048545812377 Partial missing 0.1431818181818182 0.7818181818181819 0.984375 0.8545454545454545 0 0 ref: MTVARPSVPAALPLLGELPRLLLLVLLCLPAVWG--DCGLPPDVPNAQPALEGRTSFPEDTVITYKCEESFVKIPGEKDS ... ... ENST00000645323 10Missing exon-YESMIS_120Missing exon ... Exon number: 1Exon region: chr1:130421335-130421298Nucleotide percent identity: 18.00 | BL ...
TOGA vs. hg38 (HLTOGAannotvHg38v1) Track Description
Description
TOGA
(T ool to infer O rthologs from G enome A lignments)
is a homology-based method that integrates gene annotation, inferring
orthologs and classifying genes as intact or lost.
Methods
As input, TOGA uses a gene annotation of a reference species
(human/hg38 for mammals, chicken/galGal6 for birds) and
a whole genome alignment between the reference and query genome.
TOGA implements a novel paradigm that relies on alignments of intronic
and intergenic regions and uses machine learning to accurately distinguish
orthologs from paralogs or processed pseudogenes.
To annotate genes,
CESAR 2.0
is used to determine the positions and boundaries of coding exons of a
reference transcript in the orthologous genomic locus in the query species.
Display Conventions and Configuration
Each annotated transcript is shown in a color-coded classification as
"intact" : middle 80% of the CDS
(coding sequence) is present and exhibits no gene-inactivating mutation.
These transcripts likely encode functional proteins.
"partially intact" : 50% of the CDS
is present in the query and the middle 80% of the CDS exhibits no
inactivating mutation. These transcripts may also encode functional
proteins, but the evidence is weaker as parts of the CDS are missing,
often due to assembly gaps.
"missing" : <50% of the CDS is present
in the query and the middle 80% of the CDS exhibits no inactivating
mutation.
"uncertain loss" : there is 1
inactivating mutation in the middle 80% of the CDS, but evidence is not
strong enough to classify the transcript as lost. These transcripts may
or may not encode a functional protein.
"lost" : typically several inactivating
mutations are present, thus there is strong evidence that the transcript
is unlikely to encode a functional protein.
Clicking on a transcript provides additional information about the orthology
classification, inactivating mutations, the protein sequence and protein/exon
alignments.
Credits
This data was prepared by the Michael Hiller Lab
References
The TOGA software is available from
github.com/hillerlab/TOGA
Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales AE, Ahmed AW, Kontopoulos
DG, Hilgers L et al .
Integrating gene annotation with orthology inference at scale .
Science . 2023 Apr 28;380(6643):eabn3107.
PMID: 37104600 ; PMC: PMC10193443