data science

See the following -

10 Ways Big Data And Data Science Impacted The World In 2020

Lauren Maffeo | Opensource.com | January 19, 2021

Big data’s one of many domains where open source shines. From open source alternatives for Google Analytics to new features in MySQL, 2020 brought several ways for open source enthusiasts to learn big data skills. Get up to speed on how open source data science languages, libraries, and tools help us understand our world better by reviewing the top 10 data science articles published on Opensource.com last year.

Read More »

12 Open Source Tools for Natural Language Processing

Natural language processing (NLP), the technology that powers all the chatbots, voice assistants, predictive text, and other speech/text applications that permeate our lives, has evolved significantly in the last few years. There are a wide variety of open source NLP tools out there, so I decided to survey the landscape to help you plan your next voice- or text-based application. For this review, I focused on tools that use languages I'm familiar with, even though I'm not familiar with all the tools. (I didn't find a great selection of tools in the languages I'm not familiar with anyway.) That said, I excluded tools in three languages I am familiar with, for various reasons.

Read More »

9 Resources for Data Science Projects

Data science, machine learning, artificial intelligence, and deep neural nets are all hot topics these days (and key terms that might help this post with some SEO, unless the AI sees through my attempts). Below I've shared several of the resources I use regularly while working on data science projects over the last few years. I don't read many books, so that I've shared even one is evidence of how important it is. There are enough resources here to get even the most novice engineer started on a path towards data science mastery in this new age where data science skills will be needed at every level. There is a tool for performing the work, a class taught by a renowned Stanford professor, websites with tutorials to give you real-life experience, and a site dedicated to making the latest research available to all for free so you can learn more if you want.

Read More »

A List of Open Source Tools for College

I've used Linux now for 3 1/2 years, which to me is a substantial period of time. In that time, I have gone from only using LibreOffice to expanding into a purely Linux and open source workflow. I have built my workflow around only using open source software if at all possible, although I am required to use a couple of proprietary tools sparingly. I'd like to share my own philosophy regarding open source. I was first introduced to Linux by my programming teacher; he is a passionate believer in FLOSS and he converted me. I have a passionate belief in the technical superiority of open source tools over proprietary ones because they allow me the freedom to use them however I wish...

Big Data Reaches The Hill: A Guide To Making It More Actionable

Brand Niemann | AOL Government | October 10, 2012

Big data, which has been the hot topic for conferences this year, has also received a good deal of attention on Capitol Hill in recent weeks, most notably with two recent events... Read More »

Broad Institute to Release Genome Analysis Toolkit 4 (GATK4) as Open Source Resource to Accelerate Research

Press Release | Broad Institute of MIT and Harvard | May 24, 2017

The Broad Institute of MIT and Harvard will release version 4 of the industry-leading Genome Analysis Toolkit under an open source software license. The software package, designated GATK4, contains new tools and rebuilt architecture. It is available currently as an alpha preview on the Broad Institute's GATK website, with a beta release expected in mid-June. Broad engineers announced the upgrade, as well as the decision to release the tool as an open source product, at Bio-IT World today...

Christine Doig on Data Science as a Team Discipline

Srini Penchikala | Info Q | August 26, 2016

Data science is about the design and development of solutions to extract insights from data (structured and unstructured) using machine learning and predictive analytics techniques and tools. Data Science as a discipline and Data Scientist as a role have been getting lots of attention in the recent years to solve real world problems with solutions ranging from fraud detection to recommendation engines. Christine Doig, Senior Data Scientist at Continuum Analytics, spoke at this year’s OSCON Conference about data science as a team discipline and how to navigate the data science Python ecosystem.

Read More »

Cloudera Contributing to President’s Precision Medicine Initiative

Press Release | Cloudera | February 25, 2016

Precision medicine promises to revolutionize healthcare: to improve diagnosis, to target treatment, and to deliver better care. Living up to that promise requires collaboration and analysis of large amounts of complex data from many sources, privately and securely. Cloudera, the global provider of the fastest, easiest, and most secure data management and analytics platform built on Apache Hadoop and the latest open source technologies, today announced it will support President Barack Obama’s Precision Medicine Initiative (PMI) by providing training and software, and will collaborate with academic and government research institutions doing non-commercial work on the use of data and analytics for healthcare. 

Read More »

Continuum Analytics Teams Up with Intel for Python Distribution Powered by Anaconda

Press Release | Continuum Analytics | September 8, 2016

Continuum Analytics, the creator and driving force behind Anaconda, the leading Open Data Science platform powered by Python, is pleased to announce a technical collaboration with Intel resulting in the Intel® Distribution for Python powered by Anaconda. Intel Distribution for Python powered by Anaconda was recently announced by Intel and will be delivered as part of Intel® Parallel Studio XE 2017 software development suite. With a common distribution for the Open Data Science community that increases Python and R performance up to 100X, Intel has empowered enterprises to build a new generation of intelligent applications that drive immediate business value...

Read More »

Data Science Jobs Report 2019: Python Way Up, Tensorflow Growing Rapidly, R Use Double SAS

In my ongoing quest to track The Popularity of Data Science Software, I've just updated my analysis of the job market. To save you from reading the entire tome, I'm reproducing that section here.One of the best ways to measure the popularity or market share of software for data science is to count the number of job advertisements that highlight knowledge of each as a requirement. Job ads are rich in information and are backed by money, so they are perhaps the best measure of how popular each software is now. Plots of change in job demand give us a good idea of what is likely to become more popular in the future. Read More »

Data Scientists Create Code Of Professional Conduct

Jeff Bertolucci | InformationWeek | October 7, 2013

Big data practitioners develop data science guidelines for a profession where, they say, ethics are often lacking. Read More »

Does Healthcare Need a More Modern Way to Define and Measure EHR Interoperability?

Diana Manos | Healthcare IT News | August 18, 2016

Industry experts and the federal government are divided on the best way to assess the state of the nation’s health IT interoperability. The Office for the National Coordinator for Health IT, for instance, has proposed using CIO surveys to gauge the status of interoperability among and between healthcare organizations. To that end, ONC posted a Request for Information (RFI) on how to best assess interoperability that closed last month — just not before drawing some sharp comments from across the industry...

Read More »

Halamka's Recommendations for Effective Care Management

I recently joined the advisory board of Arcadia Healthcare Solutions, a leading provider of analytics, decision support, and workflow enhancement services. At my first advisory board meeting there was a rich debate about the marketplace for care management and population health tools. I’ve spent years studying such solutions at HIMSS and found most of the products are “compiled in Powerpoint”, which is a very agile programming language, since it’s so easy to change…

Health Catalyst Launches Open Source Machine Learning: healthcare.ai

Press Release | healthcare.ai, Health Catalyst | December 1, 2016

Use of machine learning and predictive analytics to improve health outcomes has so far been limited to highly-trained data scientists, mostly in the nation's top academic medical centers. No longer. healthcare.ai is on a mission to make machine learning accessible to the thousands of healthcare professionals who possess little or no data science skills but who share an interest in using the technology to improve patient care. By making its central repository of proven machine learning algorithms available for free, healthcare.ai enables a large, diverse group of technical healthcare professionals to quickly use machine learning tools to build accurate models...

Read More »

Health Organizations Implore Congress to Fund Public Health Surveillance Systems

HLN Consulting joined more than eighty organizations, institutions, and companies in imploring Congress to fund public health surveillance systems. The appropriations request letters – one to the House and one to the Senate – seek one billion in funding over ten years (and $100 million in FY 2020) for the Centers for Disease Control and Prevention (CDC). This funding would allow CDC, state, local, tribal, and territorial health departments to move from sluggish, manual, paper-based data collection to seamless, automated, interoperable IT systems and to recruit and retain skilled data scientists to use them.

Read More »