In this free webinar, learn how to maintain statistical integrity when analyzing compositional biological data by avoiding common preprocessing mistakes. Attendees will gain insight into how to interpret ML feature importance scores using bootstrapped permutation testing to distinguish meaningful biological signals from noise and quantify uncertainty. The featured speaker will discuss how to effectively integrate generative AI tools into ranking and prioritization workflows by applying structured methodologies to ensure reproducibility and statistical robustness.
TORONTO, June 9, 2025 /PRNewswire-PRWeb/ -- Computational biology workflows increasingly rely on sophisticated statistical approaches, machine learning (ML) techniques and generative artificial intelligence (AI) agents to handle high-throughput datasets. Despite the growing role of AI in bioinformatics, these methods introduce analytical pitfalls that can undermine results, misguide interpretations and erode trust in translational outcomes.
This webinar highlights three common but often overlooked pitfalls in bioinformatics workflows, specifically focusing on compositional data handling, interpretation of ML-derived feature importance and effective use of AI agents for ranking tasks.
Pitfall 1: Mismanaging Compositional Data (RNA-seq and Beyond)
RNA sequencing generates compositional data — where the values for each sample represent parts of a whole and always add up to a fixed total. A common mistake in analyzing this type of data is filtering out genes with low counts or low variability. While this may seem helpful, it disrupts the balance of the dataset (known as compositional closure) and can lead to misleading statistical results.
In this webinar, the speaker will show how including a residual category — a placeholder for the filtered-out genes — helps maintain the integrity of the data and avoids skewing the analysis.
Another issue arises when dealing with zero values. Many standard approaches add small pseudocounts to manage these zeros, but this can introduce bias. A better solution is the PFLog1PF transformation, a method that handles zeros more reliably and supports accurate downstream analyses like principal component analysis (PCA) and clustering.
Understanding these pitfalls is crucial not only for RNA-seq but also for other fields that rely on compositional data — including microbiome profiling, dietary intake data, flow cytometry and competitive market share, making this understanding widely relevant.
Pitfall 2: Overinterpreting ML Feature Importance
ML methods often generate feature importance scores to highlight which variables — such as genes, proteins or clinical markers — most influence their predictions. These scores are frequently used as stand-ins for biological or clinical significance, but this can be problematic. The values can vary widely depending on the specific algorithm or even the software implementation used (for example, random forests in Python vs. R), and they typically don't provide any built-in measure of uncertainty.
This webinar will reframe feature importance as a statistical measurement — one that is inherently variable and should be interpreted with caution. The speaker will walk through practical examples showing how these scores can be evaluated more rigorously using statistical techniques. By applying methods like bootstrapped permutation testing, researchers can better understand whether the importance of a given feature reflects a real biological signal or is from random noise.
Participants will gain a reusable analytical framework for rigorous interpretation of feature importance in their workflows.
Pitfall 3: Misapplying Generative AI for Ranking and Prioritization
Generative AI tools, including large language models (LLMs), are increasingly used to help rank biological entities like genes, biomarkers or drug candidates. While these models offer a fast and convenient way to generate rankings, relying on them as infallible can lead to results that are inconsistent, biased or difficult to reproduce.
This webinar will introduce a more reliable way to use generative AI in prioritization tasks. The speaker will show how to integrate generative AI into controlled ranking processes using pairwise comparisons — where items are evaluated two at a time — and statistical models like the Bradley-Terry method. This allows researchers to generate rankings that are not only reproducible but also come with clear measures of confidence.
By integrating generative AI within a robust analytical framework, this approach enhances reproducibility and trust in AI-assisted decision-making across bioinformatics workflows.
Register for this webinar to learn strategies and methods to detect, understand and circumvent these hidden pitfalls of AI in bioinformatics. The session emphasizes reproducible, trustworthy bioinformatics practices that significantly enhance confidence in analytical results, supporting critical decisions throughout the drug discovery and clinical development pipelines.
Join Juan Felipe Beltrán, Director of AI, Machine Learning and Innovation, BullFrog AI, for the live webinar on Monday, June 23, 2025, at 11am EDT (4pm BST/UK).
For more information, or to register for this event, visit AI in Bioinformatics: Overcoming Pitfalls in Statistical, ML and Generative AI Approaches.
ABOUT XTALKS
Xtalks, powered by Honeycomb Worldwide Inc., is a leading provider of educational webinars and digital content to the global life science, food, healthcare and medical device communities. Every year, thousands of industry practitioners (from pharmaceutical, biotechnology, food, healthcare and medical device companies, private & academic research institutions, healthcare centers, etc.) turn to Xtalks for access to quality content. Xtalks helps professionals stay current with industry developments, regulations and jobs. Xtalks webinars also provide perspectives on key issues from top industry thought leaders and service providers.
To learn more about Xtalks visit https://xtalks.com
For information about hosting a webinar visit https://xtalks.com/why-host-a-webinar/
Contact:
Vera Kovacevic
Tel: +1 (416) 977-6555 x371
Email: [email protected]
Media Contact
Vera Kovacevic, Xtalks, +1 (416) 977-6555 x371, [email protected], www.xtalks.com
SOURCE Xtalks

Share this article