1) The biggest change was the conversion of runbooks into separate workflows. Our "SRE copilot" currently keeps an eye on logs and telemetry in a sandbox, correlates issues, opens tickets, makes suggestions for enhancements, and performs standard remediations (cordon-drain-reprovision on unstable GPU nodes, rollback on subpar releases). It's not a chatbot; it's an action taker. 2) The agent's handling of monotonous, repetitive tasks resulted in a 35-40% reduction in mean time to resolve for recurring problem classes and a significant decrease in after-hours pages. Humans moved up-stack to preventative, capacity planning, and chaos drills rather than monitoring alarms. 3) Start out small and limited. Choose three high-volume, low-blast-radius runbooks and give them strict role-based access, canary settings, and a human-in-the-loop for final approval. Ship weekly, keep an eye on the MTTA/MTTR and "false-action rate," and only broaden the agent's scope when the data suggests it is safe to do so. 4) Assign one or two competent platform engineers who have worked with workflow agents or shipping LLM to seasoned SREs or SecOps staff (think Airflow/Argo + LangChain/LLM ops). Internally upskill through agent design and on-call rotations, and look for people with incident response experience rather than just prompt engineers. 5) Hallucinations, unclear decision trails, privilege creep, and vendor lock-in are all real. Immutable audit logs, least privilege by job, tiered execution (plan, dry-run, and enforce), a "big red button" kill switch, and regular red-team simulations against the agent itself are all required. 6) As with any new SRE, provide the agent with a pager, KPIs, and postmortems. If it is unable to defend its actions, it does not act.
How Agentic AI Makes IT Work Easier I'm Aaron Chichioco, IT Specialist at Partner Systems. One thing I see all too often is an IT team that spends too much time on small jobs - checking logs and alerts, and continuing to fix problems that become worse. Agentic AI can perform the repetitive work, while teams deal with bigger fish to fry. How agentic AI changes IT work Agentic AI changes IT work by doing system monitoring, which is what takes up an IT team's valuable time. AIs can monitor logs, server states, and user activity every minute of every day. Once the AI identifies the important issue, it recommends the next steps: rerun the backup or lock the account, so the team can get straight down to fixing issues. Effects on everyday operations Issues are detected earlier, and they get fixed in minutes instead of hours. Low space, failed backup, or strange logins get resolved before having time to cause downtime. No lost time or data, as everyone gets to continue doing what they are doing without incident, ultimately reducing outages and the related impact. Best way to start A good start is to take one task that is relevant, such as backups or user access, and schedule it for AI testing first. Allow it to run on a trial basis and let your team check the outputs, and in the beginning, only allow AI to perform safe, simple tasks. If this goes well for an extended period, then begin adding other tasks gradually, too. How CIOs find the right people CIOs should hire individuals with IT operations experience with scripting or automation proficiencies. They do not need AI research backgrounds. Practical problem-solving is more important than theory. Possible risks Improper fixes to systems can lead to downtime. For instance, turning off the wrong server can cause a major incident. An additional risk is that the system is granted too much access too soon, creating the potential for data loss or security gaps to be exploited. This is also why CIOs should put very strict rules around any use of AI, and have human approval before any changes are made when required. Agentic AI is not intended to run everything on its own. It needs limits and regular checkups so mistakes do not spread. With the right setup, it adds value to managed IT services by lowering risks and maintaining system stability. Aaron Chichioco IT Specialist, Partner Systems https://partnersystems.com/
Hi, I have worked as a product lead for Google Gemini where I worked on developing some of the Agentic AI capabilities. Happy to answer some of these questions. The key way I think Agentic AI is transforming IT operations is by stitching data and logs across multiple apps and helping you take actions on those. So I think that's where the transformative power is coming from. At the end of the game, I think it's more of an efficiency gain and productivity gain that will happen. The best way to implement Agentic AI is to: 1. First come up with a high-level architecture of what they want from that Agentic AI 2. Using some of the Agentic AI frameworks from Land Graph or the evolving frameworks that are getting available 3. Make sure that the Agentic AI that have proper permissions in place have a proper communication and collaboration method in place I think getting the talent for Agentic AI is a critical part. How Agentic AI or LLMs in general have changed things is instead of becoming a deterministic workload, it has become a probabilistic workload. So the biggest change here is the skill gap that the team needs to know is how to run constant evals to understand the accuracy of the systems that they are building, and that's actually one skill that the team has to either develop in-house or hire somebody who has done that work. I know that working in Google, that skill set was easy to gain, but outside of Google, I am seeing fewer and fewer of these folks have the right skill set to be able to do that. I think one of the potential fallbacks of Agentic AI is if it is not implemented properly in a safe and secure manner, there is a huge data leakage issue that can happen. It can have unintended consequences where it can accidentally delete an important database etc. So that's why having constant eval, having better observation/observability of the tools is critical. Happy to chat more. Drop me a note at neha@arambhlabs.com
Hey John, As someone who has implemented agentic AI for 18 customers in 2025, I would like to offer my perspective on what works. Best ways to implement agentic AI A single large prompt for running complex tasks is slow, expensive, and unreliable. You inevitably arrive at a multi-agent architecture for most enterprise use cases. In a multi-agent system, the output of one agent becomes critical input for multiple downstream agents. So, the architecture needs a supervisor agent that orchestrates the flow of information. And, you need to preserve state for multi-turn dialogues for conversational use cases. Potential pitfalls Beyond the right technology architecture, there are two reasons why agentic AI projects fail. One, companies experience 'death by a 1000 POCs'. Different teams run their own proof of concepts with limited scope and potential for RoI. Selecting the right, high impact use case is still the most important challenge in IT investment. Two, pilots take too long to show value. Gen AI is a fast-moving space. 6-month pilots mean board priorities change long before your pilot sees the light of day. Locating agentic AI talent This is a tough spot for companies that are looking for talent. We have solved this in our company with a comprehensive, 6-month internship program. This year, we hired more than 40 interns to work with our team of AI and ML engineers. Our interns learn the theory and practice of gen AI by working on real-world customer projects. They get to experience what it takes to build, implement, and scale gen AI solutions for diverse industries.
1. Describe a key way in which agentic AI can transform IT operations In software development organizations, agentic AI can replace certain junior-level roles. By integrating with tools like JIRA, AI can take on small, well-defined, ticketed tasks efficiently. It often delivers faster, more cost-effective, and higher-quality results than a junior developer. 2. How will this transformation impact affected operations? Teams that adopt agentic AI will be able to move faster than those that don't. Team structures will shift, with a higher demand for senior talent to manage the agentic architecture and oversee AI-driven workflows. 3. What's the best way to implement agentic AI? Start small. Select champions within the business to experiment with agentic AI in your environment. Once you understand the right balance of integration and the types of tasks the AI can handle competently, expand usage across teams. 4. How can a CIO locate agentic AI talent? Begin internally. Many team members are already interested in this technology and may be willing to champion its adoption. These internal advocates can lead initial experimentation and guide implementation. 5. What are agentic AI's potential pitfalls? AI does not remove the need for human oversight. Outputs must be reviewed by capable humans who understand the code's implications, especially in legacy systems. All work should continue to go through standard QA and testing processes. 6. Is there anything else you would like to add? Agentic AI is still in its early stages. Organisations should focus on building comfort and confidence gradually rather than attempting to transform everything at once.
Agentic AI transforms IT operations by automating routine tasks, enabling systems to analyze data and make independent decisions. This leads to optimized network configurations and better resource management based on historical data, resulting in greater operational efficiency. As a result, IT teams experience faster response times and lower costs, allowing them to redirect their efforts toward strategic initiatives like cybersecurity and software development.
2. How would agentic AI impact operations If agentic AI was fully realized to its maximum potential, it would indeed be groundbreaking for the IT field (and any field, really). An AI system that can plan, set goals, respond with context, and autonomously work through query issues would be able to automate the vast majority of IT processes. Especially on the customer-facing side, this would decrease workloads significantly. But, that's an enormous 'if'. It also requires businesses to put a lot of faith in their AI system, which could introduce operational risks. I've discussed this further below! 5. Potential pitfalls. Although agentic AI has numerous potential benefits, the quality of these systems isn't nearly as high as many have been led to believe. In the majority of systems, the main promise of agentic AI is the ability to remove humans from the process, saving time and expediting other workflows. However, the reality is that any major IT process that was managed by AI would need at least one human interaction to check that nothing damaging was being done. Due to the high-stakes nature of IT and cybersecurity, leaving an AI system with free reign to respond to queries or interact with processes would be a huge risk for organizations. Including a human-in-the-loop for these processes has quickly become the industry norm. But, even the need to do this heavily retracts from the 'autonomous' benefit that AI promises. Companies that give AI free rein create potential vulnerabilities. Companies that include a human in the loop defeat the 24/7 automatic approach that AI offers. There isn't really a win-win here with agentic AI.
The agentic AI is meant to operate in a scenario where there is need to make a decision before even a ticket is written. It will be able to eliminate the lag time between detection and response in my world where the devices are moved in secure chains of custody and thousands of endpoints need to be tracked, wiped, or flagged. When properly implemented, it will make operations self-healing, what any modern infrastructure team needs. How will this transformation impact affected operations? It removes bottlenecks. We already have our teams of operation balancing between the asset validation, data destruction, logistic, and compliance reporting. A system that is set up to notify in case of a device being mislabeled or a mechanism that automatically risks a secure wipe is not just handy, but is increasingly defining the speed at which we are carrying out things. You are not saving time, you are saving exposure, saving trust, saving bleeding resources, which only humans should not bear a heavy load. What's the best way to implement agentic AI? Still, start with those problems that stop your people the most. In our instance, it was intake and certification of assets. We have not used AI to everything but we intended the decisions that could be repeated and but still a burden on headcount. That's where agents work best. You are not ready to automatize a pain point unless you can find the thing which is hurting you. AI is not a magic trick, it is purposeful infrastructure. How can a CIO locate agentic AI talent? I would hire a person who at 2 a.m. was available and had to work on cleaning a broken integration. Such a person knows what a good autonomous system should prevent, and not what it can do. AI researchers state that the most successful AI architects are not theorists but those that comprehend where the real processes and operations fail. Whether your AI recruit has lived afterward or not, he or she is not the one to write the agent. What are agentic AI's potential pitfalls? The first and the most important is overconfidence. Where a single agent commits a mistake by his or her call and no one can hear it at that time you have a silent failure- maybe in a complete system. That is not a bug in a business like mine where clients expect us to destroy their gadgets and give them evidence of the same. That's a lawsuit. The unaccountable agentic AI is not innovation. It's risk at scale. No space can afford clarity. There is nothing that can scale regrettably.
How agentic AI will change IT operations "Agentic AI can take repetitive monitoring and change into a proactive problem solving mechanism. Rather than waiting for a downtime alert, the system may self-diagnose, self-heal, and mitigate the problem before the user is impacted." Impact on operations "Rather than the endless fire drills, now people can devote their time to operational strategy. For my company, that means fewer emergency calls and more for value added operational plans. " How to get started "Pick a very friction operational process, like system health checks, or patch management. Then show value quickly, and it will migrate to more broader workflows." Where to get agentic AI experience "Don't just look through the traditional IT resume - they don't identify agentic AI hires. The best hires tend to have anecdotal experience, experience, data science, in the business and on hands operational experience. Peer networks and open source communities are more quite channels to get engineering." Risks "The risk of agentic AI is automation over-reliance and trust. Agentic AI will always need guard rails for the oversight, because even the smallest misstep can create a large outage." Last thought "Agentic AI is not magic - it is a tool. Agentic AI is beneficial for CIOs that use agentic AI but ACTUALLY EMPOWERS PEOPLE, by taking their place."
Describe a key way in which agentic AI can transform IT operations The agentic AI is an incident-response robot that is an independent diagnostics, performer, and fixer of issues and can write up resolutions. I have worked with those systems reducing response times as much as 20 minutes to only a few seconds as they automatically spin up backup services and fix vulnerabilities as soon as they realize they exist. How will this transformation impact affected operations? Teams become strategic and not firefighting. Mean time to resolution goes to zero and engineers are not exposed to the field of trouble-shooting. On-call stress is considerably reduced, but we develop additional issues concerning the need to develop skills and careshuff where AI is concerned. What's the best way to implement agentic AI? Begin with small but predictable incidents such as database outages. This is because you should develop knowledge on your immediate environment. Start with the read only diagnostic agents and slowly increase the permissions with the accumulation of trust. There must be integration with current monitoring tools. How can a CIO locate agentic AI talent? Recruitment based on the SRE experience or the independent system (automotive and aerospace). These engineers are already aware of decision-making AI. Internal platform engineers usually change better than pure AI researchers who do not have operational experience. What are agentic AI's potential pitfalls? The greatest risk is overconfidence in cases of edges. Perfectly working agents just make some disastrous decisions when they meet new situations. The security vulnerabilities are increased when agents receive an extensive infrastructure permission, and teams run the risk of forgetting manual response practices. Is there anything else you would like to add? The human control is also necessary. Pattern recognition and rapid response go through agentic AI, but it is business context and strategy that cannot be replicated by the experienced engineers.
At Davincified, where we leverage AI to personalize user journeys, I've seen how agentic AI can fundamentally change IT operations through automation of routine tasks and improving system resiliency. For IT teams this means moving from reactive to proactive—the AI can diagnose and fix issues prior to them having an impact on operations, allowing the teams to devote their time to strategic work. When implementing agentic AI the best way to start is with small, digestible projects that solve real problems and provide immediate value. CIOs also need to be onboard in finding the right talent—they must have AI-specific technical skills, but also think more strategically about AI alignment with business, as that will ultimately determine success and a smooth integration. The biggest hurdle with agentic AI is ensuring balance—AI should augment human talent, rather than replace it. If AI supreme decision making is exercised here, or a lack of transparency in decision making, there is always the risk of mistake. However, if deployed appropriately, AI can greatly enhance efficiencies and provide exponential value to the business. Agentic AI is less about operational enhancement, and instead about empowering teams to scale their speed and accuracy of operations. When implemented judiciously agentic AI is a potent antidote that changes not only IT, but fundamentally the way businesses operate and innovate in today's environment.
-What are agentic AI's potential pitfalls? Agentic AI becomes dangerous when the outputs are taken at face value without a layer of control. Through my work with blockchain and Web3 projects since 2017, I have witnessed firsthand how an unverifiable AI-generated compliance report or liquidity projection can mislead investor decks and capital flows. One flawed projection, gone unchecked, can harm investor confidence and stop a capital raise of $10 million and up. I have always thought of AI as a tool to speed execution, and not an authority to replace judgment with years of experience. The danger is not just in bias, but in the speed of irresponsible versions of fact scaling without a form of credibility control. There was a token sale I was involved with where automatic AI outputs showed expected adoption increased by almost 40%. If those outputs had gone directly into media distribution, the project would have lost its credibility before it was even launched. That moment reinforced for me the idea that nothing is worth more than the accuracy of the information itself, because speed in the moment, was of no value. I train teams to operationalize human governance of AI systems, so all automated insights can be authenticated prior to being released to the public or investor space.
1. IT operations experience a direct transformation through agentic AI because the system performs automated tasks such as system updates and log monitoring. Engineers now have additional time to develop innovative solutions because they no longer need to focus on emergency response activities. The technology operates as a tool to free up human resources for higher-level work instead of replacing them entirely. 2. The daily operations show significant changes through reduced interruptions and speedier problem resolution and a more peaceful work environment for the team. AI systems that detect problems early in their development process lead to decreased employee stress levels. The transformation of IT perception within organizations becomes possible through this single change. 3. The most secure method to introduce this technology involves beginning with a specific process that carries minimal risks. The AI system needs to demonstrate its capabilities to human observers before receiving more complex assignments. The process of building trust through incremental steps produces better results than implementing the technology across the board at once. 4. CIOs tend to pass over candidates who bring creative experience when they conduct their talent searches. The combination of automation experience with design and problem-solving skills enables people to learn agentic AI systems efficiently. A candidate who combines practical skills with an inquisitive nature performs better than someone who relies solely on academic credentials. 5. The main risk occurs when organizations believe their AI systems possess complete knowledge. The skills of teams deteriorate when they stop practicing their abilities. The loss of human judgment during system decision-making becomes the most severe problem because context information disappears. 6. The main advantage of agentic AI goes beyond achieving operational efficiency. The correct implementation of agentic AI technology enables IT teams to gain more authority instead of losing it. The technology will endure through time because of this particular mindset.
1. The predictive capabilities of agentic AI enable IT systems to detect resource requirements before equipment failures occur. events enables organizations to plan their operations with greater accuracy. 2. The implementation of this technology leads to The system functions as a built-in financial forecasting tool for your server infrastructure. The ability to predict future stable system operations while reducing unexpected budget expenses. System self-scaling operations prevent organizations from spending money on unused capacity. The alignment of financial numbers between IT leaders and CFOs becomes more achievable through this system. 3. The most effective approach requires starting with a limited test area. The first step should involve testing AI through storage cost optimization and uptime guarantee enhancement to demonstrate its ability to produce quantifiable results. The process of justification for further expansion becomes simpler after this initial step. 4. The process of selecting appropriate personnel proves to be challenging. The ideal candidate should demonstrate expertise in financial statement analysis together with programming skills. The combination of these employees bring to the table is uncommon yet they tend to understand organizational goals. 5. The main risk occurs when organizations become overly enthusiastic about new technologies. Teams have wasted large amounts of money on attractive tools which failed to support their strategic goals. Fast cost increases become possible when there is no governance system in place. 6. Every AI project should start with business value as its foundation according to my approach. The system needs to demonstrate its financial value through dollar-based explanations before it can be considered ready for implementation. The filter helps organizations prevent numerous problems from occurring.
1. The main capability of agentic AI systems enables them to identify potential problems which can prevent system failures. The system functions like medical diagnosis which detects health issues at their initial stages instead of forcing patients into emergency care. The transformation enables IT departments to transition from emergency response mode into preventive maintenance operations. 2. The implementation of this system results in reduced system failures during late-night operations which creates an improved user experience. The absence of sudden system failures reduces employee stress levels because they no longer experience unexpected emergency situations. The organization experiences positive effects from this stability which spreads throughout all its departments. 3. The initial step for implementing this system should begin with monitoring system implementation. AI systems should observe system patterns under human supervision until the time comes for AI to take full responsibility. The AI system will earn trust from users which enables it to assume additional responsibilities. 4. CIOs need to recruit candidates from non-traditional talent sources. The personnel who work in healthcare IT and risk-intensive sectors have already developed their ability to and structured approaches to their work. 5. The main threat emerges when organizations lose sight of human aspects function under high-pressure situations. The professionals who work on AI projects with AI tend to implement safety protocols during their operations. AI systems will select efficiency over human health unless specific boundaries are established by human operators. Organizations need to recognize this particular blind spot to prevent its occurrence. 6. The essential requirement for me involves achieving equilibrium between technology and human operations. The purpose of technology implementation should be to enhance human capabilities instead of replacing them. The key to achieving long-term acceptance lies in this approach.
1. The system of Agentic AI enables work progression through automated approval and handoff elimination. The system reduces delays and human errors because it eliminates these operational bottlenecks. The entire process operates more efficiently because of this system. 2. The system produces two main effects which include shorter delays and reduced mistakes. Users experience satisfaction when their tickets get resolved quickly while their issues resolve without prolonged delays. The system develops IT trust through unobtrusive operations. 3. The implementation should begin with a single workflow section. The AI system should handle a limited workflow to demonstrate its capabilities. The system's successful operation leads users to request additional implementation of its capabilities. 4. The search for qualified candidates should focus on individuals who have experienced direct involvement in IT operational work. Real system behavior knowledge surpasses academic knowledge because it represents the most valuable asset. The combination of machine learning knowledge with practical system understanding will produce the ideal candidate. 5. The main challenge exists within the organizational culture. The teams resist when machines attempt to direct their work activities. The system's AI operations should receive full disclosure to help users accept its functionality. 6. My belief emphasizes that humility stands as the essential factor for success. AI systems should assist human operations instead of controlling them. The method enables users to stay involved with the system.
1. The agentic AI system enables platform integration which allows different systems to exchange information with each other. IT operations currently resemble tool management because different systems fail to synchronize properly. AI functions as an orchestra conductor which maintains perfect synchronization between all musical elements. 2. The integration of AI systems leads to reduced information silos and minimizes unnecessary work efforts. The system enables data to move automatically between necessary destinations while eliminating the need for repetitive copy-paste operations. The entire department becomes more efficient through this single improvement. 3. The most successful implementation begins by creating a visual representation of current operational workflows. AI can naturally enter systems when organizations identify their pain points during the initial assessment. The implementation of AI into dysfunctional systems creates additional problems instead of solving existing issues. 4. CIOs need to recruit staff members who possess the ability to unite different fields of expertise. People who have experience in research or marketing or operations tend to learn new systems quickly. These professionals possess the ability to identify relationships which others fail to notice. 5. The implementation of AI systems on disorganized business processes creates new problems instead of solving existing ones. The implementation of AI on existing problems does not solve anything because it accelerates the existing chaos. The initial work of process cleaning will generate substantial benefits. 6. AI provides organizations with their most valuable asset which is time. When machines perform repetitive tasks people gain the opportunity to develop innovative ideas. Real innovation emerges from this space.
1. IT operations will transform through Agentic AI because it learns individual work methods to deliver customized assistance. The technology operates as a system which understands your behavior patterns to create an optimized experience. The individualized approach in IT support creates a more human connection which reduces the system's coldness. 2. The system delivers better user satisfaction and reduces the number of recurring technical issues. The team members no longer need to respond to identical inquiries multiple times. The transition leads to improved team spirit between users and IT personnel. 3. The best approach involves introducing new technology through controlled implementation phases. The introduction of AI value to users should occur first before organizations expand its operational scope. The process of building trust through sequential steps produces better results than making extensive promises. 4. During recruitment processes organizations should evaluate candidates based on their ability to adapt under pressure together with their technical abilities. People who maintain their adaptability during stressful situations become the most valuable assets. The team members will remain calm when systems undergo modifications. 5. The main drawback occurs when organizations believe personalization systems always function without errors. AI systems sometimes fail to understand user intentions which leads to user dissatisfaction. Human supervision maintains the system at a stable operational level. 6. Technology should provide assistance without entering personal boundaries according to my perspective. People will adopt new technology when they experience respect during the implementation process. The principle applies to IT operations just like it does in all other fields.
1. Agentic AI demonstrates excellent ability to detect irregularities in data. The system creates a baseline of typical behavior which triggers alerts when data points deviate from established norms. The system provides an early warning system which prevents major problems from occurring. 2. The system reduces the number of security breaches and minimizes system downtime. Users become aware of system performance improvements through their direct experience. IT system reliability builds up gradually through steady improvements in user confidence. 3. The implementation process should start with complete disclosure about the system. The system should display its detection processes and decision-making mechanisms to staff members. Staff members will develop trust when they understand the system's operations. 4. The combination of programming skills with empathetic abilities defines the ideal candidate for talent acquisition. The ability to understand users and maintain ethical standards leads developers to create safer system designs. The way people view things determines the quality of results they achieve. 5. The main risk occurs when organizations fail to maintain a measured approach. The absence of testing procedures will lead to fast deterioration of trust between users and the system. The loss of trust between users and systems becomes extremely difficult to recover after it occurs. 6. The final advice I want to share is to proceed with caution and maintain a consistent pace. System adoption becomes easier when organizations perform their deployments with caution. The process requires patience to achieve its goals.
1. The data management capabilities of agentic AI systems transform complex information into useful data that people can utilize. The system transforms confusing data into an easy-to-read visual representation. The system provides leaders with clear information that enables them to make better decisions. 2. The system provides users with quick access to information while eliminating unnecessary data processing time. The system provides teams with specific guidance which helps them avoid getting lost in spreadsheets. Strategy becomes more focused. 3. The system needs to operate with training as its foundation. Users must understand the full range of capabilities and limitations that the AI system provides. Users who receive proper training about the system will achieve better results when using it. 4. CIOs need to focus on hiring employees who will continue learning throughout their careers. Staff members who solve problems as a hobby show better acceptance of AI system implementation. Fast-changing fields often require curiosity over traditional qualifications for success. 5. The main risk occurs when organizations attempt to use AI as a quick solution instead of a tool for improvement. Strong models with insufficient human context will generate incorrect results. The system requires human input to maintain its accuracy. 6. The process of education extends throughout an individual's entire life. Teams which maintain their learning abilities will advance at the same pace as technological advancements. The ability to stay competitive depends on this practice.